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Executive  Summary 


Introduction: 

This  is  the  final  report  on  the  work  done  under  contract  DASG-60-92-C-0055  from 
Phillips  Labs  and  ARPA  to  the  Department  of  Computer  Science  at  the  University  of 
Maryland.  The  work  started  04/28/92.  The  goal  of  this  project  was  to  create  an 
environment  for  development  and  deployment  of  critical  apphcations  with  hard  real-time 
constraints  in  a  reactive  environment .  We  have  redesigned  Maruti  system  to  address  these 
issues.  In  this  report  we  highlight  the  achievements  of  this  contract.  A  pubhcations  list 
and  a  copy  of  each  of  the  publications  is  also  attached. 

Application  Development  Environment: 

To  support  applications  in  a  real-time  system,  conventional  apphcation  development 
techniques  and  tools  must  be  augmented  with  support  for  specification  and  extraction  of 
resource  requirements  and  timing  constraints,  The  application  development  system 
provides  a  set  of  programming  tools  to  support  and  facilitate  the  development  of  real-time 
applications  with  diverse  requirements.  The  Maruti  Programming  Language  (MPL)  is 
used 

to  develop  induvidual  program  modules.  The  Maruti  Configuration  Language  (MCL)  is 
used  to  specify  how  individual  program  modules  are  to  be  connected  together  to  form  an 
apphcation  and  the  details  of  the  hardware  of  which  the  apphcation  is  to  be  executed. 

In  the  current  version,  the  base  programming  language  used  is  ANSI  C.  MPL  adds 
modules,  shared  memory  blocks,  critical  regions,  typed  message  passing,  periodic 
Junctions,  and  message-invoked  functions  to  the  C  language.  To  make  analyzing  the 
resource  usage  of  programs  feasible,  certain  C  idioms  are  not  allowed  in  MPL;  in 
particular,  recursive  function  cahs  are  not  allowed  nor  are  unbounded  loops  containing 
externally  visible  events,  such  as  message  passing  and  critical  region  transition. 

MPL  Modules  are  brought  together  into  as  an  executable  apphcation  by  a  specification  file 
written  in  the  Maruti  Configuration  Language  (MCL).  The  MCL  specification  determines 
the  application’s  hard  real-time  constraints,  the  ahocation  of  tasks,  threads,  and  shared 
memory  blocks,  and  all  message-passing  connections.  MCL  is  an  interpreted  C-hke 
language  rather  than  a  declarative  language,  allowing  the  instantiation  of  comphcated 
subsystems  using  loops  and  subroutines  in  the  specification. 


Analysis  and  Resource  Allocations: 

The  basic  building  block  of  the  Maruti  computation  model  is  the  elemental  unit  (EU).  In 
general  an  elemental  unit  is  an  executable  entity  which  is  triggered  by  incoming  data  and 
signals,  operates  on  the  input  data,  and  produces  some  output  data  and  signals.  The 
behavior  of  an  EU  is  atomic  with  respect  to  its  environment.  SpecificaUy: 
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•  All  resources  needed  by  an  elemental  tmit  are  assumed  to  be  required  for  the  entire 
length  of  its  execution. 

•  The  interaction  of  an  EU  with  other  entities  of  the  system  occurs  either  before  it  starts 
executing  or  after  it  finishes  execution. 

In  order  to  define  complex  executions  ,  the  EUs  may  be  composed  together  and  properties 
specified  on  the  composition.  Elemental  units  are  composed  by  connecting  an  output  port 
of  an  EU  with  an  input  port  of  another  EU.  A  valid  connection  requires  that  the  input  and 
output  of  port  types  are  compatible,  i.e.,  they  carry  the  same  message  type.  Such  a 
connection  marks  a  one-way  flow  of  data  or  control,  depending  on  the  nature  of  the  ports. 
A  composition  of  EUs  can  be  viewed  as  a  directed  acyclic  graph,  called  an  elemental  unit 
graph  (EUG),  in  which  the  nodes  are  the  EUs,  and  the  edges  are  the  connections  between 
EUs.  An  incompletely  specified  EUG  in  which  all  input  and  output  ports  are  not 
connected  is  termed  as  a  partial  EUG  (PEUG).  A  partial  EUG  may  be  viewed  as  a  higher 
level  EU.  In  a  complete  EUG,  all  input  and  output  ports  are  connected  and  there  are  no 
cycles  in  the  graph.  The  acyclic  requirements  come  from  the  required  time  determinacy  of 
execution.  A  program  with  unbounded  cycles  or  recursions  may  not  have  a  temporally 
determinate  execution  time.  Bounded  cycles  in  an  EUG  are  converted  into  a  acyclic  graph 
by  loop  unrolling. 

Program  modules  are  independently  compiled.  In  addition  to  the  generation  of  the  object 
code,  compilation  also  results  in  the  creation  of  partial  EUGs  for  the  modules,  i.e.,  for  the 
services  and  entries  in  the  module,  as  weU  as  the  extraction  of  resource  requirements  such 
as  stack  sizes  or  threads,  memory  requirements,  and  the  logical  resource  requirements. 

Given  an  application  specification  in  the  Maruti  Configuration  Language  and  the 
component  application  modules,  the  integration  tools  are  responsible  for  creating  a 
complete  application  program  and  extracting  out  the  resource  and  timing  information  for 
scheduling  and  resource  allocation.  The  input  of  the  integration  process  are  the  program 
modules,  the  partial  EUGs  corresponding  to  the  modules,  the  application  configuration 
specification,  and  the  hardware  specifications.  The  outputs  of  the  integration  process  are:  a 
specification  for  the  loader  for  creating  tasks,  populating  their  address  space,  creating  the 
threads  and  channels,  and  initializing  the  task;  loadable  executables  of  the  program;  and  the 
complete  application  EUG  along  with  the  resource  description  for  the  resource  allocation 
and  the  scheduling  subsystem. 

After  the  application  program  has  been  analyzed  and  its  resource  requirements  and 
execution  constraints  identified,  it  can  be  allocated  and  scheduled  for  a  runtime  system. 

We  consider  the  static  allocation  and  scheduling  in  which  a  task  is  the  finest  granularity 
object  of  allocation  and  an  EU  instance  is  the  unit  of  scheduling.  In  order  to  make  the 
execution  of  instances  satisfy  the  specification  and  meet  the  timing  constraints,  we  consider 
a  scheduling  frame  whose  length  is  the  least  common  multiple  of  aU  tasks’  periods.  As 
long  as  one  instance  of  each  EU  is  scheduled  in  each  period  within  the  scheduling  frame 
and  these  executions  meet  the  timing  constraints,  a  feasible  schedule  is  obtained. 
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Maruti  Runtime  System: 

The  runtime  system  provides  the  conventional  functionality  of  an  operating  system  in  a 
manner  that  supports  the  timely  dispatching  of  jobs.  There  are  two  major  components  of 
the  runtime  system  -  the  Maruti  core,  which  is  the  operating  system  code  that  implements 
scheduling,  message  passing,  process  control,  thread  control,  and  low  level  hardware 
control,  and  the  runtime  dispatcher,  which  performs  resource  allocation  and  scheduling  or 
dynamic  arrivals. 

The  core  of  the  Maruti  hard  real-time  runtime  system  consists  of  three  data  structures: 

•  The  calendars  are  created  and  loaded  by  the  dispatcher.  Kernel  memory  is  reserved  for 
each  calendar  at  the  time  it  is  created.  Several  system  calls  serve  to  create,  delete, 
modify,  activate,  and  deactivate  calendars. 

•  The  results  table  holds  tinting  and  status  results  for  the  execution  of  each  elemental 
unit;  The  maruti_calandar_results  system  call  reports  these  results  back  up  to  the  user 
level,  usually  the  dispatcher.  The  dispatcher  can  then  keep  statistics  or  write  a  trace  file. 

•  The  pending  activation  table  holds  all  outstanding  calendar  activation  and  deactivation 
requests.  Since  the  requests  can  come  from  before  the  switch  time,  the  kernel  must 
track  the  requests  and  execute  them  at  the  correct  time  in  the  correct  order. 

The  Maruti  design  includes  the  concept  of  scenarios,  implemented  at  runtime  as  sets  of 
alternative  calendars  that  can  be  switched  quickly  to  handle  an  emergency  or  a  change  in 
operating  mode.  These  calendars  are  pre-scheduled  and  able  to  begin  execution  without 
having  to  invoke  any  user-level  machinery.  The  dispatcher  loads  the  initial  scenarios 
specified  by  the  application  and  activates  one  of  them  to  begin  normal  execution. 
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1  Introduction 

Many  complex,  mission  critical  systems  depend  not  only  on  correct  functional  behavior,  but  also 
on  correct  temporal  behavior.  These  systems  are  called  real-iime  systems.  The  most  critical 
systems  in  this  doniciin  are  those  which  must  support  applications  with  hard  real-time  constraints, 
in  which  missing  a  deadline  may  cause  a  fatal  error.  Due  to  their  criticaJity,  jobs  with  hard  real¬ 
time  constraints  must  always  execute  satisfying  the  user  specified  timing  constraints,  despite  the 
presence  of  faults  such  as  site  crashes  or  link  failures. 

A  real-time  operating  system,  besides  having  to  support  most  functions  of  a  conventional  op¬ 
erating  system,  carries  the  extra  burden  of  guaranteeing  that  the  execution  of  its  requested  jobs 
will  satisfy  their  timing  constraints.  In  order  to  carry  out  real-time  processing,  the  requirements 
of  the.  jobs  have  to  be  specified  to  the  system,  so  that  a  suitable  schedule  can.be  made  for  the 
job  execution.  Thus,  conventional  application  development  techniques  must  be  enhanced  to  in¬ 
corporate  support  for  specification  of  timing  and  resource  requirements.  Further,  tools  must  be 
made  available  to  extract  these  requirements  from  the  application  programs,  and  analyze  them  for 
schedulability. 

Based  on  the  characteristics  of  its  jobs,  a  real-time  system  can  be  classified  as  static,  dynamic 
or  reactive.  In  a  static  system,  all  (hard  real-time)  jobs  and  their  execution  characteristics  are 
known  ahead  of  time,  and  thus  can  be  statically  analyzed  prior  to  system  operation.  Many  such 
systems  are  built  using  the  cyclic  executive  or  static  priority  architecture.  In  contrast,  there  are 
many  systems  in  which  new  processing  requests  may  be  made  while  the  system  is  in  operation.  In  a 
dynamic  system,  new  requests  arrive  asynchronously  and  must  be  processed  immediately.  However, 
since  ne^•  requests  demand  immediate  attention,  such  systems  must  either  have  “soft”  constraints, 
or  be  lightly  loaded  and  rely  on  exception  mechanisms  for  violation  of  timing  constraints.  In 
contrast,  reactive  systems  have  certain  lead  time  to  decide  whether  or  not  to  accept  a  newly- 
arriving  processing  request.  Due  the  presence  of  the  lead  time,  a  reactive  system  can  carry  out 
analysis  without  adversely  affecting  the  schedulability  of  currently  accepted  requests.  If  adequate 
resources  are  available  then  the  job  is  accepted  for  execution.  On  the  other  hand,  if  adequate 
resources  are  not  available  then  the  job  is  rejected  and  does  not  execute.  The  ability  to  reject  new 
jobs  distinguishes  a  reactive  system  from  a  completely  dynamic  system. 

The  purpose  of  the  Maruti  project  is  to  create  an  environment  for  the  development  and  de¬ 
ployment  of  critical  applications  with  hard  real-time  constraints  in  a  reactive  environment.  Such 


applications  must  be  able  to  execute  on  a  platform  consisting  of  distributed  and  heterogeneous 
resources,  and  operate  continuously  in  the  presence  of  faults. 

The  Maruti  project  started  in  1988.  The  first  version  of  the  system  was  designed  as  an  object- 
oriented  system  with  suitable  extensions  for  objects  to  support  real-time  operation.  The  proof-of- 
concept  version  of  this  design  was  iir.plemented  to  run  on  top  of  the  Unix  operating  system  and 
supported  hard  and  non-real-time  applications  running  in  a  distributed,  heterogeneous  environment. 
The  feasibility  of  the  fault  tolerance  concepts  incorporated  in  the  design  of  Maruti  system  were 
also  demonstrated.  No  changes  to  the  Unix  kernel  were  made  in  that  implementation,  which  was 
operational  in  1990.  We  realized  that  Unix  is  not  a  very  hospitable  host  for  real-time  applications, 
as  very  little  control  over  the  use  oi  resources  can  be  exercised  in  that  system  without  extensive 
modifications  to  the  kernel.  Therefore,  based  on  the  lessons  learned  from  the  first  design,  we 
proceeded  with  the  design  of  the  current  version  of  Maruti  and  changed  the  implementation  bcise 
to  C.MU  Mach  which  permitted  a  more  direct  control  of  resources. 

Most  recently,  we  have  implemented  Maruti  directly  on  486  PC  hardware,  providing  Maruti 
applications  total  control  over  resources.  The  initiaJ  version  of  the  distributed  Maruti  hsis  also 
been  implemented,  allowing  Maruti  applications  to  run  across  a  network  in  a  synchronized,  hard 
real-time  manner. 

In  this  paper,  we  summarize  the  design  philosophy  of  the  Maruti  system  and  discuss  the  de¬ 
sign  and  implementation  of  Maruti.  We  also  present  the  development  tools  and  operating  svstem 
support  for  mission  critical  applications.  W'hile  the  system  is  being  designed  to  provide  integrated 
support  for  multiple  requirements  of  mission  critical  applications,  we  focus  our  attention  on  real¬ 
time  requirements  on  a  single  processor  system. 

2  Maruti  Design  Goals 

The  design  of  a  real-time  system  must  take  into  consideration  the  primary  characteristics  of  the 
applications  which  are  to  be  supported.  The  design  of  Maruti  has  been  guided  by  the  following 
appbcation  characteristics  and  requirements. 

•  Real-Time  Requirements.  The  most  important  requirement  for  real-time  sj’stems  is  the 
capability  to  support  the  timely  execution  of  applications.  In  contrast  with  many  existing 
systems,  the  next-generation  systems  will  require  support  for  hard,  soft, and  non-real-time 
applications  on  the  same  platform. 

•  Fault  Tolerance.  Many  mission-critical  systems  are  safety-critical,  and  therefore  have  fault 
tolerance  requirements.  In  this  context,  fault  tolerance  is  the  ability  of  a  system  to  support 
continuous  operation  in  the  presence  of  faults. 

Although  a  number  of  techniques  for  supporting  fault-tolerant  systems  have  been  suggested 
in  the  literature,  they  rarely  consider  the  real-time  requirements  of  the  system.  A  real-time 
operating  system  must  provide  support  for  fault  tolerance  and  exception  handling  capabilities 
for  increased  reliability  while  continuing  to  satisfy  the  timing  requirements. 

•  Distributivity.  The  inherent  characteristics  of  majoy  systems  require  that  multiple  au¬ 
tonomous  computers,  connected  through  a  local  area  network,  cooperate  in  a  distributed 
manner.  The  computers  and  other  resources  in  the  system  may  be  homogeneous  or  heteroge¬ 
neous.  Due  to  the  autonomous  operation  of  the  components  which  cooperate,  system  control 


and  coordination  becomes  a  much  more  difficult  task  than  if  the  system  were  implemented  in 
a  centralized  manner.  The  techniques  learned  in  the  design  and  implementation  of  centralized 
systems  do  not  zJways  extend  to  distributed  systems  in  a  straightforward  manner. 

•  Scenarios.  Many  real-time  applications  undergo  different  modes  of  operation  during  their 
life  cycle.  A  scenario  defines  the  set  of  jobs  executing  in  the  system  at  any  given  time.  A  hard 
real-time  system  must  be  capable  of  switching  from  one  scenario  to  another,  maintaining  the 
system  in  a  safe  and  stable  state  at  all  times,  without  violating  the  timing  constraints. 

•  Integration  of  Multiple  Requirements.  The  major  challenge  in  building  operating  sys¬ 
tems  for  mission  critical  computing  is  the  integration  of  multiple  requirements.  Because  of 
the  conflicting  nature  of  some  of  the  requirements  and  the  solutions  developed  to  date,  in¬ 
tegration  of  aU  the  requirements  in  a  single  system  is  a  formidable  task.  For  example,  the 
real-time  requirements  preclude  the  use  of  many  of  the  fault-handling  techniques  used  in  other 
fault-tolerant  systems. 

3  Design  Approach  and  Principles 

.Maruti  is  a  time-based  system  in  which  the  resources  are  reserved  prior  to  execution.  Resource 
reservation  is  done  on  the  time-line,  thus  aUowing  for  reasoning  about  real-time  properties  in  a 
natural  way.  The  time-driven  architecture  provides  predictable  execution  for  real-time  systems, 
a  necessary  requirement  for  critical  applications  requiring  hard  real-time  performance.  The  basic 
design  approach  is  outlined  below: 

•  Resource  Reservation  for  Hard  Real-Time  Jobs.  Hard  real-time  applications  in  Maruti 
have  advance  resource  reservation  resulting  in  a  priori  guarantees  about  the  timely  execution 
of  hard  real-time  jobs.  This  is  achieved  through  a  calendar  data  structure  which  keeps  track 
of  all  resource  reservations  and  the  assigned  time  intervals.  The  resource  requirements  are 
specified  as  early  as  possible  in  the  development  stage  of  an  application  and  are  manipulated, 
analyzed,  and  refined  through  all  phases  of  application  development. 

•  Predictability  through  Reduction  of  Resource  Contention.  Hard  real-time  jobs  are 
scheduled  using  a  time-driven  scheduling  paradigm  in  which  the  resource  contention  between 
jobs  is  eliminated  through  scheduling.  This  results  in  reduced  runtime  overheads  and  leads 
to  a  high  degree  of  predictability.  However,  not  all  jobs  can  be  pre-scheduled.  Since  resources 
may  be  shared  between  jobs  in  the  calendar  and  other  jobs  in  the  system,  such  as  non-real¬ 
time  activities,  there  may  be  resource  contention  leading  to  lack  of  predictability.  This  is 
countered  by  eliminating  as  much  resource  contention  as  possible  and  reducing  it  whenever 
it  is  not  possible  to  eliminate  it  entirely.  The  lack  of  predictability  is  compensated  for  by 
ahowing  enough  slack  in  the  schedule. 

•  Integrated  Support  for  Fault  Tolerance.  Fault  tolerance  objectives  are  achieved  bv 
integrating  the  support  for  fault  tolerance  at  all  levels  in  the  system  design.  Fault  tolerance 
is  supported  by  early  fault  detection  and  handling,  resilient  application  structures  through 
redundancy,  and  the  capabilitj’  to  switch  modes  of  operation.  Fault  detection  capabilities 
are  integrated  into  the  application  during  its  development,  permitting  the  use  of  application 


specific  fault  detection  and  fault  handling.  As  fault  handling  may  result  in  violation  of 
temporal  constraints,  rephcatjon  is  used  to  make  the  application  resilient.  Failure  of  a  replica 
may  not  affect  the  timely  execution  of  other  replicas  and,  thereby,  the  operation  of  the  system 
it  may  be  controlling.  Under  anticipated  load  and  failure  conditions,  it  may  become  necessary 
for  the  system  to  revoke  the  guarantees  given  to  the  hard  real-time  applications  and  change 

Its  mode  of  operation  dynamically  so  that  a.t  acceptable  degraded  mode  of  operation  mav 
continue. 

.  Separation  of  Mechanism  and  Policy.  In  the  design  of  Maruti,  an  emphasis  has  been 
placed  on  separating  mechanism  from  poUcy.  Thus,  for  instance,  the  system  provides  basic 
dispatching  mechanisms  for  a  time-driven  system,  keeping  the  design  of  specific  scheduling 
policies  separate.  The  same  approach  is  followed  in  other  aspects  of  the  system.  By  sepa¬ 
rating  the  mechanism  from  the  policy,  the  system  can  be  tailored  and  optimized  to  different 
environments. 

•  Portability  and  Extensibility.  Unlike  many  other  real-time  systems,  the  aim  of  the  Maruti 
project  has  been  to  develop  a  system  which  can  be  tailored  to  use  in  a  wide  variety  of 
situations  from  small  embedded  systems  to  complex  mission-critical  systems.  With  the  rapid 
change  in  hardware  technology,  it  is  imperative  that  the  design  be  such  that  it  is  portable  to 
different  platforms  and  makes  minimal  assumptions  about  the  underlying  hardware  platform. 
Portability  and  extensibility  is  also  enhanced  by  using  modular  design  with  well  defined 
interfaces.  This  allows  for  integration  of  new  techniques  into  the  design  with  relative  ease. 

•  Support  of  Hard,  Soft,  and  Non-Real-Time  in  the  Same  Environment.  Many  critical 
systems  consist  of  applications  with  a  mix  of  hard,  soft,  and  non-real-time  requirements.  Since 
they  may  be  sharing  data  and  resources,  they  must  execute  within  the  same  environment.  The 
approach  taken  in  Maruti  is  to  support  the  integrated  execution  of  applications  w-ith  multiple 
requirements  by  reducing  and  bounding  the  unpredictable  interaction  between  them. 

•  Support  for  Distributed  Operation.  Many  embedded  systems  require  several  processors. 
When  multiple  processors  function  autonomously,  their  use  in  hard  real-time  applications 
requires  operating  system  support  for  coordinated  resource  management.  Maruti  provides 
coordinated,  time-based  resource  management  of  all  resources  in  a  distributed  environment 
including  the  processors  and  the  communication  channels. 

•  Support  for  Multiple  Execution  Environments.  Maruti  provides  support  for  multiple 
execution  environments  to  facilitate  program  development  as  well  as  execution.  Real-time  ap¬ 
plications  may  execute  in  the  Maruti/Mach  or  Maruti /Standalone  environments  and  maintain 
a  high  degree  of  temporal  determinacy.  The  Maruti/Standalone  environment  is  best  suited 
for  the  embedded  applications  while  Maruti/Mach  permits  the  concurrent  execution  of  hard 
real-time  and  non-real- time  Unix  applications.  In  addition,  the  Maruti /Virtual  environment 
has  been  desiped  to  aid  the  development  of  real-time  applications.  In  this  environment  the 
same  code  which  runs  in  the  other  two  environments  can  execute  while  access  to  all  Unix  de¬ 
bugging  tools  is  available.  In  this  environment,  temporal  accuracy  is  maintained  with  respect 
to  a  virtual  real-time. 

•  Support  for  Temporal  Debugging.  When  an  application  executes  in  the  Maruti/Virtual 
environment  it.s  interactions  are  carried  c-..i  with  respect  to  virtual  real-time  which  is  under 


the  control  of  the  user.  The  user  may  speed  it  up  with  respect  to  actual  time  or  slow  it  down. 
The  virtual  time  may  be  paused  at  any  instant  and  the  debugging  tools  used  to  examine 
the  state  of  the  execution.  In  this  way  we  may  debug  an  application  while  maintaining  aU 
temporal  relationships,  a  process  we  call  temporal  debugging. 

4  Application  Development  Environment 

To  support  applications  in  a  real-time  system,  conventional  application  development  techniques 
and  tools  must  be  augmented  with  support  for  specification  and  extraction  of  resource  require¬ 
ments  and  timing  constraints.  The  application  development  system  provides  a  set  of  programming 
tools  to  support  and  facilitate  the  development  of  real-time  applications  with  diverse  requirements. 
The  Maruti  Programming  Language  (MPL)  is  used  to  develop  individual  program  modules.  The 
Maruti  Configuration  Language  (MCL)  is  used  to  specify  how  individual  program  modules  are  to 
be  connected  together  to  form  an  application  and  the  details  of  the  hardware  platform  on  which 
the  apphcation  is  to  be  executed. 

4.1  Maruti  Programming  Language 

Rather  than  develop  completely  new  programming  languages,  we  have  taken  the  approach  of  using 
existing  languages  as  base  programming  languages  and  augmenting  them  with  Maruti  primitives 
needed  to  provide  real-time  support. 

In  the  current  version,  the  base  programming  language  used  is  ANSI  C.  MPL  adds  modules, 
shared  memory  blocks,  critical  regions,  typed  message  passing,  periodic  functions,  and  message- 
invoked  functions  to  the  C  language.  To  make  analyzing  the  resource  usage  of  programs  feasible, 
certain  C  idioms  are  not  allowed  in  MPL;  in  particular,  recursive  function  calls  are  not  aUowed 
nor  are  unbounded  loops  containing  externally  visible  events,  such  as  message  passing  and  critical 
region  transitions. 

•  The  code  of  an  application  is  divided  into  modules,  A  module  is  a  collection  of  procedures, 
functions,  and  local  data  structures.  A  module  forms  an  independently  compiled  unit  and 
may  be  connected  with  other  modules  to  form  a  complete  application.  Each  module  may 
ha\e  an  initialization  function  which  is  invoked  to  initialize  the  module  when  it  is  loaded  into 
memory.  The  initialization  function  may  be  called  with  arguments. 

•  Communication  primitives  send  and  receive  messages  on  one-way  typed  channels.  There  are 
several  options  for^  defining  channel  endpoints  that  specify  what  to  do  on  buffer  overflow  or 

,  when  no  message  is  in  the  channel.  The  connection  of  two  end-points  is  done  in  the  MCL 
specification  for  the  application— Maruti  insures  that  end-points  are  of  the  same  type  and 
are  connected  properly  at  runtime. 

•  Penod}c  functions  define  entry  points  ior  execution  in  the  application.  The  MCL  specification 
for  the  application  will  determine  w’hen  these  functions  execute. 

•  Message-invoked  functions,  called  services,  are  executed  whenever  messages  are  received  on 
a  channel. 


•  Shared  memory  blocks  can  be  declared  inside  modules  and  are  connected  together  as  specified 
in  the  MCL  specifications  for  the  application. 

•  .An  action  defines  a  sequence  of  code  that  denotes  an  externally  observable  action  of  the 
module.  Actions  are  used  to  specify  timing  constraints  in  the  MCL  specification. 

•  Critical  Regions  are  used  to  safely  access  and  maintain  data  consistency  between  executing 
entities.  Maruti  ensures  that  no  two  entities  are  scheduled  to  execute  inside  their  critical 
regions  at  the  same  time. 

4.2  Maruti  Configuration  Language 

MPL  Modules  are  brought  together  into  as  an  executable  appDcation  by  a  specification  file  written 
in  the  Maruti  Configuration  Language  (MCL).  The  MCL  specification  determines  the  application’s 
hard  real-time  constraints,  the  allocation  of  tasks,  threads,  and  shared  memory  blocks,  and  aU 
message-passing  connections.  MCL  is  an  interpreted  C-like  language  rather  than  a  declarative 
language,  allowing  the  instantiation  of  complicated  subsystems  using  loops  and  subroutines  in  the 
specification.  The  key  features  of  .MCL  include: 

•  Tasks,  Threads,  and  Channel  Binding.  Each  module  may  be  instantiated  any  number 
of  times  to  generate  tasks.  The  threads  of  a  task  are  created  by  instantiating  the  entries  and 
services  of  the  corresponding  module.  An  entry  instantiation  also  indicates  the  job  to  which 
the  entry  belongs.  A  service  instantiation  belongs  to  the  job  of  its  client.  The  instantiation 
of  a  service  or  entry  requires  binding  the  input  and  output  ports  to  a  channel.  A  channel 
has  a  single  input  port  indicating  the  sender  and  one  or  more  output  ports  indicating  the 
receivers.  The  configuration  language  uses  channel  variables  for  defining  the  channels.  The 
definition  of  a  channel  also  includes  the  type  of  communication  it  supports,  i.e.,  synchronous 
or  asynchronous. 

•  Resources.  ..All  global  resources  (i.e.,  resources  which  are  visible  outside  a  module)  are 
specified  in  the  configuration  file,  along  with  the  access  restrictions  on  the  resource.  The 
configuration  language  aUows  for  binding  of  resources  in  a  module  to  the  global  resources. 
.Any  resources  used  by  a  module  which  are  not  mapped  to  a  global  resource  are  considered 
local  to  the  module. 

•  Timing  Requirements  and  Constraints.  These  are  used  to  specify  the  temporal  require¬ 
ments  and  constraints  of  the  program.  An  application  consists  of  a  set  of  cooperating  jobs. 
.A  job  is  a  set  of  entries  (and  the  services  called  by  the  entries)  which  closely  cooperate. 
.Associated  with  each  job  are  its  invocation  characteristics,  i.e.,  whether  it  is  periodic  or  ape¬ 
riodic.  For  a  periodic  job,  its  period  and,  optionally,  the  ready  time  and  deadline  within  the 
period  are  specified.  The  constraints  of  a  job  apply  to  all  component  threads.  In  addition 
to  constraints  on  jobs  and  threads,  finer  level  timing  constraints  may  be  specified  on  the 
observable  actions.  An  observable  action  may  be  specified  in  the  code  of  the  program.  For 
any  observable  action,  a  ready  time  and  a  deadline  may  be  specified.  These  are  relative  to 
the  job  arrival.  .An  action  may  not  start  executing  before  the  ready  time  and  must  finish 
before  the  deadline.  Each  thread  is  an  implicitly  observable  action,  and  hence  may  have  a 
ready  time  and  a  deadline. 


Apart  from  the  ready  lime  and  deadline  constraints,  programs  in  Maruli  can  also  specifv 
relative  timing  constraints,  those  which  constrain  the  interval  between  two -events.  For  each 
action,  the  start  and  end  of  the  action  mark  the  observable  events.  A  relative  constraint  is 
used  to  constrain  the  temporal  separation  between  two  such  events.  It  may  be  a  relative 
deadline  constraint  which  specifies  the  upper  bound  on  time  between  two  events,  or  a  delay 
constraint  which  specifies  the  lower  bound  on  time  between  the  occurrence  of  the  two  events. 
The  interval  constraints  are  closer  to  the  event-based  real-time  specifications,  which  constrain 
the  minimum  and/or  ma.ximum  distance  between  two  events  and  allow  for  a  rich  expression 
of  timing  constraints  for  real-time  programs. 

•  Replication  and  Fault  Tolerance.  At  the  application  level  fault  tolerance  is  achieved  by 
creating  resilient  applications  by  replicating  parts,  or  aU,  of  the  application.  The  configuration 
language  eases  the  task  of  achieving  fault  tolerance  by  allowing  mechanisms  to  replicate 
the  modules,  and  services,  thus  achieving  the  desired  amount  of  resiliency.  By  specifying 
allocation  constraints,  a  programmer  can  ensure  that  the  replicated  modules  are  executed  on 
different  partitions. 


5  Analysis  and  Resource  Allocation 

This  phase  involves  analyzing  the  resource  aUocation  and  scheduling  of  a  collection  of  applications 
in  terms  of  their  real-time  and  fault-tolerance  properties.  The  properties  of  the  system  are  analyzed 
with  respect  to  the  system  configuration  and  the  characteristics  of  the  runtime  system,  and  resource 
calendars  are  generated. 

The  analysis  phase  converts  the  application  program  into  fine-grained  segments  called  elemental 
units  (ETs).  AU  subsequent  analysis  and  resource  allocation  are  based  on  Ells. 

5.1  Elemental  Unit  Model 

The  basic  building  block  of  the  Maruti  computation  model  is  the  elemental  unit  (EU).  In  general, 
an  elemental  unit  is  an  executable  entity  which  is  triggered  by  incoming  data  and  signals,  operates 
on  the  input  data,  and  produces  some  output  data  and  signals.  The  behavior  of  an  EU  is  atomic 
with  respect  to  its  environment.  Specifically: 

•  AU  resources  needed  by  an  elemental  unit  are  assumed  to  be  required  for  the  entire  length  of 
its  execution. 

•  The  interaction  of  an  EU  with  other  entities  of  the  systems  occurs  either  before  it  starts 
executing  or  after  it  finishes  execution. 

The  components  of  an  EU  are  iUustrated  in  Figure  1  and  are  described  below: 


input  data/si|!nals 


Figure  1:  Structure  of  an  Elemental  Unit 


•  Input  and  Output  Ports.  Each  EU  may  have  several  input  and/or  output  ports.  Each 
port  specifies  a  part  of  the  interface  of  the  EU.  The  input  ports  are  used  to  accept  incoming 
input  data  to  the  EU,  while  the  output  ports  are  used  for  feeding  the  output  of  the  EU  to 
other  entities  in  the  system. 

•  Input  and  Output  Monitors.  An  input  monitor  collects  the  data  from  the  input  ports, 
and  provides  it  to  the  main  body.  In  doing  so,  it  acts  as  a  filter,  and  may  also  be  used  for  error 
detection  and  debugging.  The  input  monitors  are  also  used  for  supporting  different  triggering 
conditions  for  the  EU.  Similar  to  input  monitors,  the  output  monitors  act  as  filters  to  the 
outgoing  data.  The  output  monitor  may  be  used  for  error  detection  and  timing  constraint 
enforcement.  The  monitors  may  be  connected  to  other  EUs  in  the  system  and  may  send 
(asynchronous)  messages  to  them  reporting  errors  or  status  messages.  The  receiving  EU  may 
perform  some  error-handling  functions. 

•  Main  Body.  The  main  body  accepts  the  input  data  from  the  input  monitor,  acts  on  it.  and 
supplies  the  output  to  the  output  monitor.  It  defines  the  functionality  provided  by  the  EU. 

.4nnoiated  with  an  elemental  unit  are  its  resource  requirements  and  timing  constraints,  which 
are  supplied  to  the  resource  schedulers.  The  resource  schedulers  must  ensure  that  the  resources  are 
made  available  to  the  EU  at  the  time  of  execution  and  that  its  timing  constraints  are  satisfied. 

5.2  Composition  of  EUs 

In  order  to  define  complex  executions,  the  EUs  may  be  composed  together  and  properties  specified 
on  the  composition.  Elemental  units  are  composed  by  connecting  an  output  port  of  an  EU  with 
an  input  port  of  another  EU.  A  valid  connection  requires  that  the  input  and  output  port  types 
are  compatible,  i.e.,  they  carry  the  same  message  type.  Such  a  connection  marks  a  one-way  flow 
of  data  or  control,  depending  on  the  nature  of  the  ports.  A  composition  of  EUs  can  be  viewed  as 
a  directed  acycbc  graph,  called  an  elemental  unit  graph  (EUG).  in  which  the  nodes  are  the  EUs. 
and  the  edges  are  the  connections  between  EUs.  An  incompletely  specified  EUG  in  which  all  input 


and  output  ports  are  not  connected  is  termed  as  a  partial  EUG  (PEUG).  A  partial  EUG  may  be 
vie%%ed  as  a  higher  level  EU.  In  a  complete  EEG.  all  input  and  output  ports  are  connected  and  there 
are  no  cycles  in  the  graph.  The  acyclic  requirement  comes  from  the  required  time  determinacy  of 
execution.  A  program  with  unbounded  cycles  or  recursions  may  not  have  a  temporally  determinate 
execution  time.  Bounded  cycles  in  an  EUG  are  converted  into  an  acyclic  graph  bv  loop  unrolling. 

The  composition  of  EUs  supports  higher  level  abstractions  and  the  properties  associated  with 
them.  B\  carefully  choosing  the  abstractions,  the  task  of  developing  applications  and  ensuring  that 
the  timing  and  other  operational  constraints  are  satisfied  can  be  greatly  simplified.  In  Maruti,  we 
have  chosen  the  following  abstractions: 

*  A  thread  is  a  sequential  composition  of  elemental  units.  It  has  a  sequential  flow  of  control 
which  is  triggered  by  a  message  to  the  first  EU  in  the  thread.  The  flow  of  control  is  terminated 
with  the  last  EU  in  the  thread.  Two  adjacent  EUs  of  a  thread  are  connected  by  a  single  link 
carrying  the  flo\v  of  control.  The  component  elemental  units  may  receive  messages  or  send 
messages  to  elemental  units  outside  the  thread.  AU  EUs  of  a  thread  share  the  execution  stack 
and  processor  state. 

•  .A  job  IS  a  coDection  of  threads  which  cooperate  with  each  other  to  provide  some  functionality. 
The  partial  EU  Gs  of  the  component  threads  are  connected  together  in  a  well  defined  manner 
to  form  a  complete  EUG.  AD  threads  within  a  job  operate  under  a  global  timing  constraint 
specified  for  the  job. 

5.3  Program  Analysis 

Program  modules  are  independently  compiled.  In  addition  to  the  generation  of  the  object  code, 
compilation  also  results  in  the  creation  of  partial  EUGs  for  the  modules,  i.e.,  for  the  services  and 
entries  in  the  module,  as  weD  as  the  extraction  of  resource  requirements  such  as  stack  sizes  for 
threads,  memory  requirements,  and  logical  resource  requirements. 

Invocation  of  an  entry  point  and  service  caD  starts  a  new  thread  of  execution.  A  control  flow 
graph  is  generated  for  each  service  and  entry.  The  control  flow  graph  and  the  MPL  primitives  are 
used  to  depneate  EU  boundaries.  Note  that  an  EU  execution  is  atomic,  i.e.,  aD  resources  required 
by  the  EU  are  assumed  to  be  used  for  the  entire  duration  of  its  execution.  Further,  aU  input 
messages  are  assumed  to  be  logicaDy  received  at  the  start  of  an  EU  and  aD  output  messages  are 
assumed  to  be  logicaDy  sent  at  the  end  of  an  EU.  At  compilation  time,  the  code  for  each  entrv 
and  service  is  broken  up  into  one  or  more  elemental  units.  The  deDneation  of  EU  boundaries  is 
done  in  a  manner  that  ensures  that  no  cycles  are  formed  in  the  resultant  EUG.  Thus,  for  instance, 
a  send  foDowed  by  a  receive  within  the  same  EU  may  result  in  a  cycDc  precedence  and  must  be 
prevented.  foDow  certain  rules  of  thumb  to  deDneate  EU  boundaries,  which  may  be  overridden 
and  expDcitly  changed  by  the  user.  The  EU  boundaries  are  created  at  a  receive  statement,  the 
beginning  and  end  of  a  resource  block,  and  the  beginning  and  end  of  an  observable  action.  For 
each  elemental  unit  a  symboDc  name  is  generated  and  is  used  to  identify  it.  The  predecessors  and 
successors  of  the  EU  as  weD  as  the  source  code  Dne  numbers  associated  with  the  EU  are  identified 
and  stored.  The  resource  and  timing  requirements  that  can  be  identified  during  compilation  are 
also  stored,  and  place  holders  are  created  for  the  remaining  information. 

Given  an  appDcation  specification  in  the  Maruti  Configuration  Language  and  the  component 
appDcation  modules,  the  integration  tools  are  responsible  for  creating  a  complete  appDcation  pro- 


gram  and  extracting  out  the  resource  and  timing  information  for  scheduling  and  resource  allocation. 
The  input  to  the  integration  process  are  the  program  modules,  the  partial  EUGs  corresponding 
to  the  modules,  the  application  configuration  specification,  and  the  hardware  specifications.  The 
outputs  of  the  integration  process  are:  a  specification  for  the  loader  for  creating  tasks,  populating 
their  address  spaces,  creating  the  threads  and  channels,  and  initializing  the  task;  loadable  executa¬ 
bles  of  the  program:  and  the  complete  application  EUG  along  with  the  resource  descriptions  for 
the  resource  aOocation  and  scheduling  subsystem. 

5.4  Communication  Model 

Maruti  supports  message  passing  and  shared  memory  models  for  communication. 

•  Message  Passing.  Maruti  supports  the  notion  of  one-way  message  passing  between  ele¬ 
mental  units.  Message  passing  provides  a  location-independent  and  architecture-transparent 
communication  paradigm.  .4  channel  abstraction  is  used  to  specify  a  one  way  message  com¬ 
munication  path  between  a  sender  and  a  receiver.  .4  one-way  message-passing  channel  is  set 
up  by  declaring  the  output  port  on  the  sender  EU,  the  input  port  on  the  receiver  EU,  and 
the  type  of  message.  The  communication  is  asynchronous  with  respect  to  the  sender,  i.e.,  the 
sender  does  not  block. 

•  Synchronous  Communication.  Synchronous  communication  is  used  for  tightly  coupled 
message  passing  between  elemental  units  of  the  same  job.  For  every  invocation  of  the  sender 
there  is  an  invocation  of  the  receiver  which  accepts  the  message  sent  by  the  sender.  The 
receiver  is  blocked  (de-scheduled)  until  message  arrival  under  normal  circumstances.  The 
messages  in  a  synchronous  communication  channel  are  delivered  in  FIFO  order. 

•  Asynchronous  Communication.  .4synchronous  communication  may  be  used  for  message 
passing  between  elemental  units  not  belonging  to  the  same  job.  It  may  also  be  used  between 
real-time  and  non-real-time  jobs.  In  such  communication,  neither  the  sender  nor  the  receiver 
is  blocked  (i.e..  there  is  no  synchronization).  Since  the  sender  and  receiver  may  execute  at 
different  rales,  it  is  possible  that  no  finite  amount  of  buffers  suffice.  Hence,  an  asynchronous 
communication  channel  is  inherently  lossy.  The  receiver  may  specify  its  input  port  to  be 
inFirst  or  inLast  to  indicate  which  messages  to  drop  when  the  buffers  are  full.  The  first 
message  is  dropped  in  an  inLast  channel,  while  the  last  message  is  dropped  in  an  inFirst 
channel. 

There  may  be  multiple  receivers  of  a  message,  thus  allowing  for  multi-cast  messages.  Similar 
to  a  one-to-one  channel,  a  multicast  channel  may  also  be  synchronous  or  asynchronous.  -411 
receivers  of  a  multi-cast  message  must  be  of  the  same  type. 

•  Shared  Memory.  Shared  memory  is  also  supported  in  Maruti.  The  simplest  way  to  share 
memory  between  Ells  is  to  aDow  them  to  exist  within  the  same  address  space.  We  use  task 
abstraction  for  this  purpose.  A  task  consists  of  multiple  threads  operating  within  it,  sharing 
the  address  space.  The  task  serves  as  an  execution  environment  for  the  component  threads. 
A  thread  may  belong  to  only  one  task.  In  addition  to  the  shared  memory  within  a  task,  inter¬ 
task  sharing  is  also  supported  through  the  creation  of  shared  memory  partitions.  A  shared 
memory  partition  is  a  shared  buffer  which  can  be  accessed  by  any  EU  permitted  to  do  so. 


The  shared  memory  partitions  provide  an  efficient  way  to  access  data  shared  between  multiple 
EUs.  The  shared  memory  communication  paradigm  provides  just  the  shared  memory — it  is 
the  user's  responsibility  to  ensure  safe  access  to  the  shared  data.  This  can  be  done  by  defining 
a  logical  resource  and  ensuring  that  the  resource  is  acquired  every  time  the  shared  data  is 
accessed.  By  providing  appropriate  restrictions  on  the  logical  resource,  safe  access  to  data 
can  be  ensured. 

5.5  Resource  Model 

A  distributed  system  consists  of  a  collection  of  autonomous  processing  nodes  connected  via  a  local 
area  network.  Each  processing  node  has  resources  classified  as  processors,  logical  resources,  and 
peripheral  devices.  Logical  resources  are  used  to  provide  safe  access  to  shared  data  structures 
and  are  passive  in  nature.  The  peripheral  devices  include  sensors  and  actuators.  Restrictions 
may  be  placed  on  the  preemptability  of  resources  to  maintain  resource  consistency.  The  type  of 
the  resource  determines  the  restrictions  that  are  placed  on  the  preemptability  of  the  resource  and 
serves  to  identify  operational  constraints  for  the  purpose  of  resource  allocation  and  scheduling.  We 
classify  the  resources  into  the  following  types  based  on  the  restrictions  that  are  imposed  on  their 
usage. 

•  Non-preemptable.  The  inherent  characteristics  of  a  resource  may  be  such  that  it  prevents 
preemptability,  i.e.,  any  usage  of  the  resource  must  not  be  preempted.  Many  devices  require 
non-preemptive  scheduling.  For  resources  which  require  the  use  of  CPU,  this  implies  non- 
preemptive  execution  from  the  time  the  resource  is  acquired  until  the  time  the  resource  is 
released. 

•  Exclusive.  Unlike  a  non-preemptive  resource,  an  exclusive  resource  can  be  preempted.  How¬ 
ever,  the  resource  may  not  be  granted  to  anyone  else  in  the  meantime.  A  critical  section  is 
an  example  of  a  resource  which  must  be  used  in  exclusive  mode. 

•  Serially  Reusable.  A  serially  reusable  resource  can  not  only  be  preempted  but  may  adso  be 
granted  to  another  EU.  The  state  of  such  resources  can  be  preserved  and  restored  when  the 
resource  is  granted  back. 

•  Shared.  A  shared  resource  may  be  used  by  multiple  entities  simultaneously.  In  a  single 
processor  system,  since  only  one  entity  is  executing  at  a  given  time,  there  is  no  distinction 
between  a  shared  resource  and  a  serially  reusable  resource. 

A  non-preemptable  resource  is  the  most  restrictive  and  a  shared  resource  is  the  least  restrictive 
in  terms  of  the  type  of  usage  allowed.  An  application  requesting  the  use  of  a  resource  must 
specify  when  the  resource  is  to  be  acquired,  when  it  is  to  be  released,  and  the  restrictions  on  the 
preemptability  of  the  resource.  The  resource  requirements  for  applications  may  be  specified  at 
different  levels  of  computational  abstractions  as  identified  below. 

•  EU  level.  The  lowest  level  a  resource  requirement  can  be  specified  at  is  the  EU  level. 
A  resource  requirement  specified  at  the  EU  level  implies  that  the  resource  is  acquired  and 
released  within  the  EU.  For  scheduling  purposes,  it  is  assumed  that  the  resource  is  required 
for  the  entire  duration  of  the  execution  of  the  EU. 


•  Thread  Level.  Resource  specification  at  the  thread  level  is  used  for  resources  which  are 
acquired  and  released  by  different  EUs  belonging  to  the  same  thread.  For  instance,  a  critical 
section  may  be  acquired  in  one  EU  and  released  in  another  one. 

•  Job  Level.  Job-level  resource  specifications  are  used  to  specify  resources  which  are  not 
acquired  and  released  for  each  invocation  of  a  periodic  or  sporadic  job.  Instead,  these  resources 
are  acquired  at  the  job  initialization  and  released  at  job  termination.  For  a  periodic  job, 
an  implicit  resource  associated  with  each  thread  are  the  thread  data  structures  (including 
processor  stack  and  registers). 

5.6  Operational  Constraints 

The  execution  of  EUs  is  constrained  through  various  kinds  of  operational  constraints.  Such  con¬ 
straints  may  arise  out  of  restricted  resource  usage  or  through  the  operational  requirements  of  the 
application.  Examples  of  such  constraints  are:  precedence,  mutual  exclusion,  ready  time,  and 
deadline.  They  may  be  classified  into  the  following  categories: 

•  Synchronization  Constraints.  Synchronization  constraints  arise  out  of  data  and  control 
dependencies  or  through  resource  preemption  restrictions.  Typical  examples  of  such  con¬ 
straints  are  precedence  and  mutual  exclusion. 

•  Timing  Constraints.  Many  types  of  timing  constraints  may  be  specified  at  different  levels, 
i.e.,  at  job  level,  thread  level,  or  EU  level.  At  the  job  level,  one  may  specify  the  ready 
time,  deadline,  and  whether  the  job  is  periodic,  sporadic,  or  aperiodic.  For  threads,  a  ready 
time  and  deadline  may  be  specified  relative  to  the  job  arrival.  Likewise,  a  ready  time  and 
deadline  may  be  specified  for  an  individual  EU.  We  aiso  support  the  notion  of  relative  timing 
constraints,  i.e.,  constraints  on  the  temporal  distance  between  the  execution  of  two  EUs. 

•  Allocation  Constraints.  In  our  model,  tasks  are  allocated  to  processing  nodes.  Allocation 
constraints  are  used  to  restrict  the  taisk  allocation  decisions.  ADocation  constraints  often  arise 
due  to  fault-tolerance  requirements,  where  the  replicas  of  EUs  must  be  allocated  on  different 
processing  nodes.  Similarly,  when  two  tasks  share  memory,  they  must  be  allocated  on  the 
same  processing  node.  Sometimes  a  task  must  be  bound  to  a  processing  node  since  it  uses  a 
particular  resource  bound  to  the  node  (e.g.,  a  sensor). 

The  operational  constraints  are  made  available  to  the  resource  allocation  and  scheduling  tools 
which  must  ensure  that  the  allocation  and  scheduling  maintains  the  restrictions  imposed  by  the 
constraints.  The  model  does  not  place  any  a  priori  restrictions  on  the  nature  of  the  constraints 
that  may  be  specified.  However,  the  techniques  used  by  the  resource  allocator  and  scheduler  will 
depend  on  the  type  of  constraints  that  can  be  specified. 

5.7  Allocation  and  Scheduling 

After  the  application  program  has  been  analyzed  and  its  resource  requirements  and  execution 
constraints  identified,  it  can  be  aUocated  and  scheduled  for  a  runtime  system. 

This  final  phase  of  program  development  depends  upon  the  physical  characteristics  of  the  hard¬ 
ware  on  which  the  application  will  be  run,  for  example,  the  location  of  devices  and  the  number  of 
nodes  and  type  of  processors  on  each  node  in  the  distributed  system. 


Maruli  uses  time-based  scheduling  and  the  scheduler  creates  a  data  structure  called  a  calendar 
which  defines  the  execution  instances  in  time  for  all  executable  components  of  the  applications  to 
be  run  concurrently. 

We  consider  the  static  allocation  and  scheduling  in  which  a  task  is  the  finest  granularity  object 
of  allocation  and  an  EU  instance  is  the  unit  of  scheduling.  In  order  to  make  the  execution  of 
instances  satisfy  the  specifications  and  meet  the  timing  constraints,  we  consider  a  scheduling  frame 
whose  length  is  the  least  common  multiple  of  aU  tasks’  periods.  As  long  as  one  instance  of  each 
EL-  is  scheduled  in  each  period  within  the  scheduling  frame  and  these  executions  meet  the  timing 
constraints  a  feasible  schedule  is  obtained. 

As  a  part  of  the  Maruti  development  effort,  a  number  of  scheduling  techniques  have  been 
developed  and  are  used  for  generating  schedules  and  calendars  for  task  sets.  These  techniques 
include  the  use  of  temporal  analysis  and  simulated  annealing.  Schedules  for  single-processor  svstems 
as  well  as  multiple-processor  networks  are  developed  using  these  techniques. 

6  Maruti  Runtime  System 

The  runtime  system  provides  the  conventional  functionality  of  an  operating  system  in  a  manner 
that  supports  the  timely  dispatching  of  jobs.  There  are  two  major  components  of  the  runtime 
system  the  Maruti  core,  which  is  the  operating  system  code  that  implements  scheduling,  message 
passing,  process  control,  thread  control,  and  low  level  hardware  control,  and  the  runtime  dispatcher, 
which  performs  resource  allocation  and  scheduling  for  dynamic  arrivals. 

6.1  The  Dispatcher 

The  dispatcher  carries  out  the  foUowing  tasks: 

•  Resource  Management.  The  dispatcher  handles  requests  to  load  applications.  This  in¬ 
volves  creating  all  the  tasks  and  threads  of  the  application,  reserving  memory,  and  loading  the 
code  and  data  into  memory.  All  the  resources  are  reserved  before  an  application  is  considered 
successfully  loaded  and  ready  to  run. 

•  Calendar  Management.  The  dispatcher  creates  and  loads  the  calendars  used  bv  applica¬ 
tions  and  activates  them  when  the  application  run  time  arrives.  The  application  itself  can 
activate  and  deactivate  calendars  for  scenario  changes. 

•  Connection  Management.  A  Maruti  application  may  consist  of  many  different  tasks  using 
channels  for  communication.  The  aispatcher  sets  up  the  connections  between  the  application 
tasks  using  direct  shared  buffers  for  local  connections  or  a  shared  buffer  with  a  communications 
agent  for  remote  connections. 

•  Exception  Handling.  Rogue  application  threads  may  generate  exceptions  such  as  missed 
deadlines,  arithmetic  exceptions,  stack  overflows,  and  stray  accesses  to  unreserved  mem¬ 
ory.  These  exceptions  are  normally  handled  by  the  dispatcher  for  all  the  Maruti  application 
threads.  Various  exception  handling  behaviors  can  be  configured,  from  terminating  the  entire 
application  or  just  the  errant  thread,  to  simply  invoking  a  task-specific  handler. 


6.2  Core  Organization 

The  core  of  the  Maruti  hard  real-time  runtime  system  consists  of  three  data  structures: 

•  The  calendars  are  created  and  loaded  by  the  dispatcher.  Kernel  memory  is  reserved  for  each 
calendar  at  the  time  it  is  created.  Several  system  calls  serve  to  create,  delete,  modify,  activate, 
and  deactivate  calendars. 

•  The  results  table  holds  timing  and  status  results  for  the  execution  of  each  elemental  unit. 
The  maruti.calendar.results  system  call  reports  these  results  back  up  to  the  user  level, 
usually  to  the  dispatcher.  The  dispatcher  can  then  keep  statistics  or  write  a  trace  file. 

•  The  pending  activation  table  holds  aU  outstanding  calendar  activation  and  deactivation  re¬ 
quests.  Since  the  requests  can  come  before  the  switch  time,  the  kernel  must  track  the  requests 
and  execute  them  at  the  correct  time  in  the  correct  order. 

The  scheduler  gains  control  of  the  CPU  at  every  clock  tick  interrupt.  At  that  time,  if  a  Maruti 
thread  is  currently  running  and  its  deadline  has  passed  its  e,xecution  is  stopped  and  an  exception 
raised. 

If  any  pending  activations  are  due  to  be  executed  those  requests  are  handled,  therebv  changing 
the  set  of  active  calendars.  Then  the  next  calendar  entry  is  checked  to  see  if  it  is  scheduled  to 
execute  at  this  time.  If  so,  the  scheduler  switches  immediately  to  the  specified  thread.  If  no  hard 
real-time  threads  are  scheduled  to  execute,  the  calendar  scheduler  falls  through  to  the  soft  and 
non-real-time,  priority-based  schedulers. 

Maruti  threads  indicate  to  the  scheduler  that  they  haive  successfully  reached  the  end  of  their 
elemental  unit  with  the  maruti.unit.done  system  call.  This  call  marks  the  current  calendar  entry 
as  done  and  fills  in  the  time  actually  used  by  the  thread.  The  Maruti  thread  is  then  suspended 
until  it  next  appears  in  the  calendars.  Soft  and  non-real-time  threads  can  be  run  until  the  next 
calendar  entry  is  scheduled  and  are  executed  using  a  priority  based  scheduling  for  the  available 
time  slots. 

.A.t  aU  times  the  Maruti  scheduler  knows  which  calendar  entry  will  be  the  next  one  to  run  so  that 
the  calendars  are  not  continuaUy  searched  for  work.  This  is  recalculated  when  iDaruti_uiiit_done 
is  called  or  whenever  the  set  of  active  Ccdendars  changes. 

6.3  Multiple  Scenarios 

The  Maruti  design  includes  the  concept  of  scenarios,  implemented  at  runtime  as  sets  of  alternative 
calendars  that  can  be  switched  quicklj'  to  handle  an  emergency  or  a  change  in  operating  mode. 
These  calendars  are  pre-scheduled  and  able  to  begin  execution  without  having  to  invoke  any  user- 
level  machinery.  The  dispatcher  loads  the  initial  scenarios  specified  by  the  application  and  activates 
one  of  them  to  begin  normal  execution.  However,  the  application  itself  can  activate  and  deactivate 
scenarios.  For  example,  an  appUcation  might  need  to  respond  instantaneously  to  the  pressing  of  an 
emergency  shutdown  button.  A  single  sy’stem  call  then  causes  the  immediate  suspension  of  normal 
activity  and  the  running  of  the  shutdown  code  sequence.  Calendar  activation  and  deactivation 
commanas  can  be  issued  before  the  desired  switch  time.  The  requests  are  recorded  and  the  switches 
occur  at  the  precise  moment  specified.  This  aUows  the  application  to  insure  smooth  transitions  at 
safe  points  in  the  execution. 


Figure  2:  Maruti  System  Architecture 


7  Maruti  3.0  System  Architecture 

Maruti  3.0.  the  current  version  of  the  operating  system,  implements  most  of  the  above  design  with 
a  series  of  development  tools  that  operate  in  a  Berkeley  Unix  development  environment  (NetBSD 
1.0)  on  IBM-compatible  486  or  Pentium  PCs.  Maruti  applications  can  be  run  stand-alone  on  the 
bare  hardware  or  under  a  Unix-based  debugging  environment. 

•  MPL  code  is  processed  by  the  MPL  compiler,  a  modified  version  of  the  GNU  C  compiler.  The 
MPL  compiler  generates  both  the  compiled  object  code  and  partial  EUG  file  that  contains 
all  information  extracted  from  the  module  for  further  analysis,  including  the  boundaries  of 
the  elemental  units  of  the  program. 

•  The  application's  MCL  specification  is  read  and  interpreted  by  the  integrator.  The  PEUG  file 
describing  each  module  used  in  the  application  is  processed  and  intermodule  type  checking 
performed.  The  integrator  generates  a  file  specifying  the  full  application  EUG,  allocation, 
and  scheduling  constraints. 


•  The  allocalor/scheduler  reads  in  the  data  supplied  by  the  integrator  and  a  description  of 
the  physical  system  on  which  to  allocate  the  application.  The  allocator  searches  for  an 
arrangement  of  elemental  units  on  the  nodes  of  the  network  that  satisfies  all  the  timing  and 
allocation  constraints,  considering  the  computation  times  for  each  elemental  unit.  If  a  feasible 
schedule  can  be  found,  a  calendar  file  for  each  resource  is  generated.  A  loader  map  is  also 
generated  which  describes,  for  the  runtime  system,  each  task,  thread,  shared  memory  area, 
and  communications  link  so  that  aU  the  resources  can  be  reserved  when  the  appDcalion  is 
loaded. 

•  The  computation  time  analyzer  takes  timing  trace  information  generated  by  the  runtime 
system  and  generates  worst-case  execution  times  for  all  the  EUs  of  the  application.  This 
timing  information  can  be  used  in  subsequent  runs  of  the  scheduler  to  refine  the  schedule 
and  verify  its  feasibility  given  changes  in  computation  times.  Use  of  the  timing  tool  during 
testing  leads  to  very  high  confidence  in  the  schedule. 

7.1  Runtime  Environments 

Compiled  and  analyzed  Maruti  applications  can  be  executed  in  multiple  runtime  environments. 

•  The  Maruti/ Virtual  runtime  environment  allows  the  debugging  of  Maruti  applications  within 
the  development  environment.  Applications  run  in  virtual  real-time  under  Unix,  allowing 
temporal  debugging,  including  single  stepping  the  real-time  calendars. 

•  The  Maruti/Mach  runtime  environment  is  a  modified  version  of  Mach  which  allows  the  run¬ 
ning  of  real-time  Maruti  programs  within  the  Mach  environment,  where  the  real-time  and 
non-real-time  task  can  co-e.xist  and  interact  in  the  same  host. 

•  The  Maruti /Standalone  rjintime  environment  runs  the  application  on  the  bare  hardware, 
suitable  for  embedded  systems.  The  application  is  linked  with  a  minimal  Maruti  core  library 
and  can  be  booted  directly. 


7.2  Maruti/Virtual  Runtime  Environment 

Testing  real-time  programs  in  their  native  embedded  environments  can  be  tedious  and  very  time- 
consuming  because  of  the  lack  of  debugging  facibties  and  the  requirement  to  reload  and  reboot 
the  target  computer  every  time  a  change  is  made.  Maruti  provides  a  Unix-based  runtime  system 
that  allows  the  execution  of  Maruti  hard-real-time  applications  from  within  the  Unix  development 
environment.  This  Unix  execution  environment  supports  the  following  features: 

•  The  Maruti  application  has  direct  control  of  its  I/O  device  hardware. 

•  Graphical  output  and  keyboard  input  can  go  either  to  the  PC  console,  as  in  the  Maruti/Standalone 
and  Maruti/Mach  environments,  or  appear  in  an  X  window  on  any  Unix  w’orkstation.  possibly 
across  the  network. 

•  The  application  can  be  run  under  the  Unix  GNU  Debugger,  allowing  the  examination  of 
program  variables  and  stack  traces,  setting  of  breakpoints,  and  post-mortem  analysis. 


Figure  3:  Maruti/Virtual  screen  running  in  the  development  environment 


•  The  application  has  access  to  Lnix  standard  output  so  it  can  print  debug  and  status  messages 
to  the  interactive  session  while  running. 

•  The  Maruti  application  runs  in  virtual  real-time;  that  is.  it  sees  itself  running  in  hard-real-time 
against  a  virtual  time  base. 

•  The  virtual  time  can  be  manipulated  through  the  runtime  system  for  temporsJ  debugging. 
Virtual  time  can  be  slowed  down  or  sped  up,  and  individual  elemental  units  (EU)  or  whole 
calendars  can  be  single-stepped  or  traced. 

7.3  Maruti/Standalone  Real-Time  Environment 

Maruti/Standalone  provides  a  minimal  runtime  system  for  the  execution  of  a  Maruti  application 
on  the  bare  hardware.  The  stand-alone  environment  has  the  following  attributes: 

•  The  stand-alone  version  of  an  application  is  built  from  the  same  object  modules  as  are  used 
in  the  Lnix  and  Maruti/Mach  execution  environments. 


•  All  the  modules  of  the  application  are  bound  with  only  those  routines  of  the  Maruti  core  that 
are  needed  into  one  executable,  suitable  for  booting  directlv  or  converting  into  ROM. 

•  The  application  has  complete  control  of  the  computer  hardware. 

•  The  application  runs  in  hard  real-time  with  very  low  overhead  and  variability. 

•  The  minimal  Maruti/Standalone  core  library  currently  consists  of  about  16  KB  of  code  and 
16  KB  of  data. 

•  The  optional  Maruti  Distributed  Operation  support  (including  network  driver)  is  about  14 
KB  of  code  and  9  KB  of  data. 

•  The  optional  Maruti  graphics  package  currently  consists  of,  for  the  standard  VG.4  version.  10 
KB  of  code  and  20  KB  of  data  (plus  150K  for  a  secondary  frame  buffer  for  best  performance). 

7.4  Maruti/Mach  Real-Time  Environment 

The  original  execution  environment  for  Maruti-2  was  a  modified  version  of  the  CMU  Mach  3.0 
kernel.  Maruti/Mach  is  potentially  useful  in  hybrid  environments  in  which  the  real-time  components 
co-e.xist  with  Mach  and  Bnix  processes  on  the  same  CPU.  Because  of  preemptabilitv  problems  in 
CMU  Mach  we  will  not  be  distributing  Maruti/Mach  until  it  can  be  rehosted  onto  OSFl/MK 
real-time  kernels. 

The  Maruti/Mach  features  include  the  following: 

•  A  calendar-based  real-time  scheduler  has  been  added  to  the  CMU  Mach  3.0  kernel.  This 
scheduler  lakes  precedence  over  the  existing  Mach  scheduler,  running  Maruti  elemental  units 
from  the  calendar  at  the  proper  release  time. 

•  The  .Maruti  application  and  most  of  the  runtime  system  run  as  normal  Mach  user-level  tasks 
and  threads,  which  are  wired  down  in  memory. 

•  The  Maruti  application  ma\  communicate  with  non-Maruti  Unix  and  Mach  processes  through 

shared  memory.  ° 

•  The  Ivlaruti/Mach  kernel  maintains  runtime  information  for  each  elemental  unit  executed, 
and  makes  that  information  available  to  the  user-level  code  for  worst-case  computation  time 
analysis. 

•  Parts  of  the  CMU  Mach  kernel  remain  unpreemptable.  Nevertheless,  on  a  dedicated  system 
we  can  achieve  release  lime  variability  of  about  100  microseconds.  The  context  switch  time 
is  about  200  microseconds. 

•  The  new  release  of  OSF  Research  Institute  Mach  MK6.0  addresses  most  of  the  Mach  kernel 
preemptabilitv-  concerns.  We  wiD  be  porting  Maruti/Mach  to  this  base  in  the  near  future. 


8  Future  Directions 


The  Maruti  Project  is  an  ongoing  research  effort.  We  hope  to  extend  the  current  system  in  a 
number  of  possible  directions.  Of  course,  since  this  is  a  research  project,  we  expect  our  ideas  to 
evolve  over  time  as  we  gain  experience  and  get  feedback  from  users. 

8.1  Scheduling  and  Analysis  Extensions 
Preemptable  Scheduling  of  Hard-Real-Time  Tasks 

We  are  planning  to  extend  our  scheduling  approach  to  incorporate  controlled  preemptions  of 
tasks.  To  date  we  have  concentrated  on  using  non-preemptable  executions  of  tasks,  which  sim¬ 
plifies  scheduling  and  eases  exclusion  problems  in  application  development.  However,  the  non- 
preemptability  assumption  to  exclusion  is  not  scalable  to  a  multiprocessor,  as  threads  running  on 
different  processors  can  interfere  with  each  other.  Controlled  preemption  is  more  powerful,  as  it 
allows  scheduling  of  long-running  tasks  concurrently  with  high  frequency  tasks.  Preemption  wiU 
remain  under  the  control  of  the  application. 

Language  support  for  atomic  actions  will  be  developed  to  replace  the  assumption  of  non- 
preemptable  EL  s.  Action  statements  will  serve  to  delineate  sections  of  code  on  which  precise 
timing  requirements  can  be  imposed  by  the  appDcalion  designer.  Combined  with  critical  region 
statements  (already  implemented),  actions  wiU  allow  the  programmer  to  specify  precisely  the  de¬ 
sired  timing  and  resource  interrelationships  in  a  manner  that  is  scalable  to  a  multiprocessor  or 
network  cluster,  unlike  the  non-preemptability  assumption. 

We  will  extend  the  Maruti  run-time  system  to  handle  preemptable  hard  real-time  tasks.  This 
wiU  be  done  in  coordination  with  the  analysis  tools  which  will  generate  multiple  calendar  entries 
for  the  preempted  EUs.  All  but  the  last  entry  for  the  EU  will  be  marked  as  preemptable,  and  aU 
but  the  first  will  be  marked  as  continuation  entries.  This  is  enough  information  for  the  run-time 
scheduler  to  correctly  handle  the  preemption  in  a  controlled  manner,  even  when  the  EU  completes 
early. 

Integration  of  Time-based  and  Priority-Based  Scheduling 

\^e  plan  to  integrate  the  time- based  and  priority-based  scheduling  in  a  single  framework.  To 
date  we  have  concentrated  on  time-based  scheduling  only.  To  support  other  scheduling  paradigms 
within  the  time-based  framework,  we  may  reserve  time  slots  in  the  schedule  and  associate  a  queue 
of  waiting  tasks  which  are  executed  on  the  basis  of  their  priorities.  In  this  way  we  can  implement 
rate-monotonic  style  static  priority  schemes  as  well  as  Earliest- Deadline- First  style  dynamic  priority 
schemes  within  the  Maruti  framework.  However,  in  order  to  assure  that  the  tasks  executed  under 
priority-based  scheduling  will  continue  to  meet  their  temporal  requirements,  extensions  to  the 
analysis  techniques  are  required.  We  will  develop  analysis  techniques  suitable  for  this  purpose. 

We  will  extend  the  Maruti  implementation  to  support  non-calendar  schedulers,  such  as  prior¬ 
ity  based  or  earliest-deadline-first  based  schedulers.  These  schedulers  will  run  in  particular  slots 
specified  in  the  Maruti  calendar,  or  when  the  calendar  is  idle. 


POSIX-RT  Subset  API 


In  a  related  area,  we  plan  to  study  the  use  of  a  subset  of  the  POSIX  API  as  the  Maruti  API  for 
soft  and  non-real-lime  tasks.  We  wiU  implement  as  much  of  the  POSIX-RT  API  as  is  appropriate 
and  practicable. 

Asynchronous  Events 

GeneraUy.  in  a  time-based  system,  events  are  poDed  for  at  the  maximum  frequency  at  which  they 
are  expected.  This  type  of  event  handling  is  easy  to  analyze  within  the  time-based  framework,  and 
makes  explicit  the  need  to  reserve  enough  time  to  handle  the  event  stream  at  its  worst-case  arrival 
rate.  At  this  worst-case  rate,  polling  is  more  efficient  than  interrupt-driven  event  handling  because 
the  interrupt  overhead  is  avoided.  However,  at  low  event  rates,  polling  is  less  efficient  and  fragments 
the  cpu  idle  time  (where  we  define  idle  time  from  the  point  of  view  of  hard  real-time  tasks).  While 
conservation  of  idle  time  is  not  an  issue  for  smaD  controllers,  it  becomes  very  important  when  there 
are  soft-  and  non-real-time  tasks  running  in  the  system. 

Currently.  Maruti  takes  the  polling  approach  to  ezLse  analysis  and  to  better  handle  the  worst 
case  rate.  We  plan  to  study  the  analysis  required  to  accommodate  asynchronous  events  within  a 
calendar  schedule.  Our  intended  approach  is  to  work  with  a  specified  maximum  frequency,  relative 
deadline,  and  computation  time  of  the  asynchronous  event,  and  to  reserve  enough  time  in  the 
calendar  for  the  event  to  occur  at  its  maximum  frequency. 

We  wiU  extend  the  Maruti  run-time  system  to  register  and  dispatch  event  handlers  in  response 
to  external  events.  Included  in  this  extension  wiU  be  the  ability  to  detect  and  appropriately  handle 
overload  conditions  (i.e.  when  the  events  occur  more  quickly  that  expected). 

Mulii-Dimensional  Resource  Scheduling  Research 

A  typical  real-time  appUcation  requires  several  resources  for  it  to  execute.  While  CPU  is  the  most 
critical  resource,  others  have  to  be  made  available  in  a  timely  manner.  Generation  of  schedules  for 
multiple  resources  is  known  to  be  a  difficult  problem.  Our  approach  to  date  has  been  to  develop 
efficient  search  techniques,  such  as  one  based  on  simulated  annealing. 

Realistic  problems  contain  a  variety  of  interdependencies  among  tasks  which  must  be  reflected  as 
constraints  in  scheduling.  We  plan  to  develop  efficient  techniques  for  scheduling  the  zdlocation  and 
deaUocation  of  portions  of  multidimensional  resources.  In  particular,  we  wiU  address  the  problems 
of  aUocation  and  management  of  resources  such  as  memory  and  disk  space,  that  can  accommodate 
matiy  entities  simultaneously. 

Scheduling  System-Specific  Topologies 

In  a  related  area,  many  communications  networks  have  more  complex  structures  than  a  simple  bus 
and  cannot  be  treated  as  a  single  dedicated  resource.  We  will  study  the  extension  of  our  schedul¬ 
ing  algorithms  to  support  point-to-point  meshes  of  nodes  (with  store-and-forward  of  messages), 
switched  networks  (such  as  MyriNet),  and  sophisticated  backplanes  such  as  that  used  in  the  Intel 
Paragon. 

We  wiU  investigate  the  use  of  a  general  framework  for  specifying  the  properties  of  connection 
topologies  to  the  Maruti  scheduler.  In  the  worst  cases,  the  scheduler  for  a  complex  interconnection 


technology  may  have  to  be  programmed  explicitly.  To  handle  such  cases,  we  wiU  develop  a  modular 
interface  into  our  aUocator/scheduler  into  which  such  backplane-specific  schedulers  can  be  plugged. 

Static  Estimation  of  Execution  Times 

Currently,  execution  times  are  derived  through  extensive  testing  of  the  program  on  the  target 
hardware  environment.  Deriving  the  execution  time  through  static  analysis  is  hampered  by  the 
data  dependencies  present  in  large  number  in  most  programs. 

We  will  investigate  the  use  of  static  anaJysis  to  help  prove  the  execution  time  limits  of  programs. 
While  generating  a  reasonable  computation  time  estimate  through  static  analysis  is  not  feasible  in 
general,  it  is  possible  to  get  accurate  results  for  large  segments  of  a  program,  and  to  clearly  identify 
the  existing  data  dependencies  so  that  the  programmer  can-through  program  modifications  or 
directives  to  the  analysis  tool-eliminate,  curtail,  or  characterize  the  data  dependencies  well  enougli 
to  get  very  useful  verification  of  the  time  properties  of  the  program. 

Temporal  Debugging 

W  hen  we  develop  real-time  applications  we  need  techniques  for  observing  the  temporal  behavior 
of  programs.  For  their  functional  characteristics  we  can  use  standard  debuggers  which  permit 
the  observation  of  the  state  of  execution  at  any  stage.  This,  however,  destroys  the  temporal 
relationships  completely.  In  Maruti/V  irtual  we  provide  the  facilities  of  controlling  the  execution 
of  aU  parts  of  an  application  with  respect  to  a  virtual  time  which  advances  under  the  control  of 
keyboard  directives.  Thus  we  can  pause  the  execution  at  any  virtual  time  instant  with  the  assurance 
that  ah  temporal  relationships  with  respect  to  this  instant  are  accurately  reflected  in  the  state  of 
the  program.  W'e  use  the  term  tempored  debugging  for  this. 

^^e  wiU  conduct  research  on  the  theoretical  aispects  of  the  issues  of  temporal  debugging  and 
consider  the  implications  of  temporal  debugging.  In  particular,  we  will  study  how  the  interactions 
of  programs  executing  in  virtual  time  with  external  events  w^hich  occur  with  respect  to  their  own 
lime  line  should  be  captured  in  temporal  debugging.  We  will  also  study  how  the  virtual  times  of 
several  nodes  in  a  distributed 'environment  should  be  coordinated. 

We  wiD  extend  our  implementation  of  temporal  debugging  tools  in  the  Maruti/Virtual  en¬ 
vironment  to  support  temporal  debugging  of  distributed  programs,  and  to  support  fine  grained 
modification  of  the  time  line. 

Dynamic  Schedule  Generation 

We  wiD  develop  the  notion  of  time  horizons  to  support  controlled  modifications  of  the  hard  real¬ 
time  calendars  at  runtime  to  support  programs  that  generate  schedules  dynamicaUy.  While  the 
run-time  mechanisms  for  modifying  the  calendars  are  already  implemented,  research  issues  relating 
to  finding  safe  points  to  switch  schedules,  and  scheduling  the  schedulers  themselves,  have  to  be 
studied  before  effective  use  can  be  made  of  on-line  calendar  generation. 


8.2  Fault  Tolerance 

Maruti  currently  supports  several  powerful  mechanisms  for  building  fault  tolerant  applications: 

•  Maruti  Configuration  Language  (MCL)  constructs  aUow  the  application  designer  to  specify 
replication  of  application  subsystems  with  forkers  and  joiners  inserted  into  the  communication 
streams,  as  weU  as  the  aUocation  constraints  necessary  to  correctly  partition  the  replicated 
subsystems  for  the  desired  level  of  fault  tolerance. 

•  Maruti  Programming  Language  (MPL)  aUows  the  programming  of  application  specific  fault 
tolerance  components  such  as  forkers  and  joiners,  elemental  unit  monitors,  and  channel  mon¬ 
itors. 

•  The  run-time  system  supports  multiple  calendars,  allowing  the  application  to  switch  to  emer¬ 
gency  or  fault  handling  scenarios  in  real  time. 

We  plan  to  e.xtend  the  existing  mechanisms  by  providing  tools  and  new  mechanisms  to  better 
automate  the  process  of  building  fault  tolerant  applications.  The  new  features  will  include: 

•  A  library  of  forkers  and  joiners  that  can  be  incorporated  into  applications. 

•  Support  for  multicast  messages. 

•  Belter  support  in  Maruti  Programming  Language  (MPL)  for  EU  and  channel  monitors. 

•  Automatic  replication  of  subsystems,  and  analysis  of  fault  tolerance  properties  through  M.A.GIC 
the  graphical  integrator  described  below. 

8.3  Clock  Synchronization 

Currently,  distributed  Maruti  handles,  clock  drift  at  boot-up  time,  and  thereafter  time  slave  nodes 
simply  adopt  the  time-master's  clock  periodicaDy.  This  scheme  is  suitable  for  many  applications, 
but  is  not  ideal  for  embedded  control  systems  that  will  suffer  from  a  discontinuous  time  jump. 

To  address  this  problem  we  plan  to  develop  and  implement  time-synchronization  edgorithms 
that  operate  concurrently  with  the  distributed  real-time  program  to  continually  adjust  the  clocks 
on  aU  the  nodes,  taking  into  account  changes  in  their  relative  drift.  This  will  most  likely  involve  a 
regular  time  pulse  from  a  master  clock,  from  which  the  other  nodes  continually  measure  their  drift 
and  fine-tune  their  tick  rates.  Since  the  clock  drifts  are  about  one  order  of  magnitude  less  than  the 
communication  latency  variances,  a  simple  algorithm  will  not  suffice  here. 

8.4  Heterogeneous  Operation 

We  will  extend  our  communications  agents  and  boot  protocol  to  translate  typed  Maruti  messages 
between  heterogeneous  hosts  when  needed.  The  off-line  Maruti  analysis  tools  already  collect  infor¬ 
mation  on  the  types  of  the  channel  endpoints  for  type-checking  the  connection.  We  will  carry  this 
information  through  to  the  run-time  system  for  use  in  those  channels  that  are  connected  between 
heterogeneous  nodes. 


8.5  MPL/Ada 

We  will  incorporate  Maruli  Programming  Language  (MPL)  features  and  analysis  into  the  Ada  95 
programming  language  as  we  did  for  ANSI  C  in  the  current  MPL,  which  we  will  now  refer  to  as 
MPL/C.  Implementing  MPL/Ada.  will  involve  the  following  tasks: 

•  A  detailed  design  review  studying  those  features  of  Ada  which  are  compatible  with  Maruti 
and  those  that  are  not,  and  how  best  to  proceed  with  the  implementation  of  MPL/Ada. 

•  Port  GNU  .Ada  (GN.AT)  to  our  NetBSD  development  environment. 

•  Implement  as  much  of  the  .Ada  run-time  environment  as  is  practicable  on  the  Maruti  run-time. 

•  Install  hooks  into  GN.AT  to  extract  the  resource  usage  information  we  need.  We  expect  this 
work  will  leverage  heavily  from  the  MPL/C  work,  as  GN.AT  is  derived  from  the  same  back-end 
code  base  as  GNU  C. 


•  Develop  and  enforce  within  GN.AT  those  restrictions  on  Ada  constructs  needed  in  order  to 
preserve  the  properties  needed  for  our  hard  real-time  analysis. 

•  .Add  support  for  Maruti  primitives  to  the  language.  Some  Maruti  primitives  might  be  imple- . 
mentable  directly  through  existing  .Ada  facilities  and  thus  will  not  require  language  extensions. 
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Figure  4:  Prototype  Graphical  Program  Integrator  Tool 


8.6  Graphical  Tools 

Graphical  Program  Development  Tools 

Currently.  Maruti  applications  are  puUed  together  by  an  MCL  specification,  which  takes  the  form 
of  a  procedural  language  whose  primitive  operations  instantiate  and  bind  together  the  parts  of  the 
application.  This  type  of  specification  language  is  complete,  allowing  the  specification  of  large,  com¬ 
plex  applications  connected  in  arbitrary  ways.  However,  such  completeness  makes  MCL  relativelv 
low-level  and  tedious  to  program. 

We  are  developing  graphical  program  development  tools  which  allow  the  application  designer  to 
pull  together  the  modules  using  an  entirely  graphical  user  interface — avoiding  MCL  programming. 
The  on-screen  representation  of  modules  can  be  interconnected  with  channels  and  grouped  into 
hierarchical  subsystems.  The  application  designer  will  be  able  to  zoom  in  and  out  to  view  the 
application  at  several  levels. 

The  graphical  environment  will  allow  both  the  integration  of  existing  modules  and  the  develop¬ 
ment  of  the  interfaces  of  modules  that  have  not  yet  been  written.  The  tools  wiU  generate  template 
MPL  code  for  those  modules.  In  this  way  the  graphical  environment  functions  as  a  design  tool  and 
program  generator  as  well  as  an  integration  environment. 

The  graphical  environment  wiU  have  fault-tolerance  analysis  built  into  it.  Single  points  of  failure 
wiU  be  identified  on-screen.  The  user  wiU  be  able  to  replicate  entire  subsystems  at  once,  with  the 
/orA:er  and  joiner  modules  and  allocation  constraints  introduced  into  the  application  automatically 
by  the  system. 

This  graphical  style  of  application  integration  wiU  greatly  facilitate  the  building  and  deployment 
of  reusable  software  components  modules  built  to  be  easily  customized  and  reintegrated  into  manv 
appbcations.  Given  a  suitable  library  of  reusable  component  modules  and  the  graphical  integrator. 
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Figure  5:  Prototype  Graphical  Resource  Scheduling  Tool 

it  will  be  possible  for  non-programmers  to  build  large  custom  applications  from  these  parts. 
Graphical  Resource  Management  Tools 

Along  with  the  graphical  software  development  tools,  we  are  pursuing  graphical  resource  manage¬ 
ment  tools.  These  are  a  non-programmer's  interface  into  the  advanced  Maruti  scheduling  tech¬ 
nology.  The  Maruti  aUocator /scheduler  works  with  the  abstract  concepts  of  schedulable  entities, 
available  resources,  and  various  types  of  constraints  on  the  placement  of  entities  and  resources.  In 
the  Maruti  operating  system,  the  scheduling  entities  are  EUs.  and  the  resources  are  CPUs,  net¬ 
work.  memor}',  and  devices — but  in  fact  any  type  of  entity  or  resource  can  be  manipulated  by  the 
aUocator/scheduler. 

A  graphical  resource  management  tool  will  allow  the  specification  of  these  entities,  resources, 
and  constraints  on  screen  in  a  way  more  oriented  towards  the  general  user.  With  this  tool  users 
should  be  able  to  use  Maruti  scheduling  technology  to  schedule  classes,  busses,  or  projects,  for 
example. 

We  have  built  a  smaiU  prototype  of  the  graphical  resource  manager.  The  prototype  displays  the 
EU  graph  input  to  the  scheduler  as  weh  as  the  calendar  output  of  the  scheduler.  The  user  can  edit 
the  EU  graph  and  its  constraints  and  reschedule  with  the  click  of  a  button.  The  resulting  resource 
calendar  is  redisplayed. 


9  Availability 


We  are  pleased  to  announce  the  availability  of  the  Maruti  3.0  Hard  Real-Time  Operating  System 
and  Development  Environment. 

Whh  Maruti  3.0,  we  are  entering  a  new  phase  of  our  project.  We  have  an  operating  system 
suitable  for  field  use  by  a  wider  range  of  users,  and  we  are  embarking  on  the  integration  of  our 
time-based,  hard  real-time  technology  with  industry  standards  zsid  more  traditional  event-based 
soft-  and  non-real-time  systems.  For  this,  we  are  greatly  interested  in  the  feedback  from  users  as 
to  the  direction  of  evolution  of  the  system. 

For  the  Maruti  3  project,  we  will  be  pursuing  the  integration  of  a  POSIX  interface  for  soft  and 
non-real-time  applications,  the  use  of  Ada  for  Maruti  programming,  support  for  asynchronous 
events  and  soft /non-real  time  schedulers  within  the  time-based  framework,  and  heterogeneous 
Maruti  networks. 

For  this  user-oriented  phase  of  the  project  we  will  be  making  regular  releases  of  our  software 
available  to  allow  interested  parties  to  track  and  influence  our  development.  To  begin  this  phase 
we  are  making  our  current  base  hard  real-time  operating  system  and  its  development  environment 
available.  This  is  an  initial  test  release. 

Maruti  3.0  will  be  made  available  to  interested  parties  on  request,  via  Internet  ftp.  Please 
send  electronic  mail  to  maruti-distQcs  .umd.  edu  for  details.  More  information  about  the  Maruti 
Project,  as  well  as  papers  and  documentation,  are  available  via  the  World  Wide  Web  at: 

http : / /www . cs . umd . edu/pro j  ects /maruti/ 


9.1  Runtime  System 

The  Maruti  3.0  embeddable  hard  real-time  runtime  system  for  distributed  and  single-node  systems 
includes  the  following  features: 

•  The  core  Maruti  runtime  system  is  small  -  16  KB  code  for  the  single  node  core,  30  KB  code 
for  the  distributed  core. 

•  The  core  provides  a  calendar-based  scheduler,  threads,  distributed  message  passing  using 
Time  Division  Multiplexed  Access  (TDM A)  over  the  network,  and  tight  time  synchronization 
between  network  nodes. 

•  Also  included  in  the  runtime  system  is  a  graphics  library  suitable  for  system  monitoring 
displays  as  well  as  simulations. 

•  Maruti  runs  on  PC- AT  compatible  computers  using  the  Intel  1386  (wdth  i387  coprocessor), 
i486DX,  or  Pentium  processors.  Distributed  operation  currently  requires  a  3Com  3c507  eth- 
ernet  card.  The  graphics  library  supports  standard  VGA  and  Tseng-Labs  ET-4000-based 
Super- VGA.  Support  for  other  SVGA  chipsets  is  forthcoming  soon. 

9.2  Development  'Environment 

Maruti  3.0  includes  a  complete  development  environment  for  distributed  embedded  hard  real-time 
applications.  The  development  environment  runs  on  NetBSD  Unix  and  includes  the  following: 


The  Maruti/\  irtual  debugging  environment  -  simulates  the  Maruli  runtime  system  within  the 
development  envirciiment.  The  system  clock  in  this  environment  tracks  virtual  time,  which 
can  be  sped  up.  slowed  down  in  relation  to  the  actual  time,  or  single-stepped  or  stopped. 
This  allows  temporal  debugging  of  the  application.  Within  Maruti/ Virtual  traces  of  the 
application  scheduDng  and  network  traffic  can  be  monitored  in  the  debugging  session. 

The  ANSI-C  based  Maruti  Programming  Language  (MPL/C).  MPL  adds  modules,  message 
passing  primitives,  shared  memory,  periodic  functions,  message-invoked  functions,  and  exclu¬ 
sion  regions  to  ANSI  C.  MPL  is  processed  by  a  version  of  the  GNU  C  compiler  which  has  been 
modified  to  recognize  the  new  MPL  features,  and  to  output  information  about  the  resources 
used  by  the  MPL  program. 

The  Maruti  Configuration  Language  (MCL).  MCL  allows  the  system  designer  to  specify  the 
placement,  liming  constraints,  and  interconnections  of  all  the  modules  in  an  application. 
MCL  is  a  powerful  interpreted  C-like  language,  allowing  complex,  hierarchical  configuration 
specifications,  including  replication  of  components  and  installation-site  specific  sizing  of  the 
application.  The  MCL  processor  analyses  the  application  graph  for  completeness,  and  type- 
checks  aU  connections. 

The  Maruti  AUocator/Scheduler.  The  Maruti  allocation  and  scheduling  tool  analyses  the 
information  generated  by  the  MPL  compiler  and  the  MCL  integrator  to  find  an  aJlocation 
and  scheduling  of  the  tasks  of  a  distributed  application  across  the  nodes  of  a  Maruti  network. 
All  relative  and  global  timing,  exclusion,  and  precedence  constraints  are  taken  into  account 
in  finding  a  schedule,  as  are  the  network  speed  and  scheduling  parameters. 

The  Maruti  Timing  Trace  Analyzer.  The  Timing  Analyzer  calculates  worst-case  computation 
times  from  timing  files  output  by  the  runtime  system.  Computation  times  ar*e  calculated  for 
each  scheduling  unit  in  the  application,  and  these  times  can  be  fed  back  into  the  Alloca¬ 
tor/Scheduler  for  more  precise  scheduling  analysis. 

The  Maruli  Runtime  Binoer  (mbind).  One  of  the  features  of  Maruti  is  the  late  binding  of  an 
appbcation  to  a  particular  runtime  system.  The  same  application  binaries  can  be  combined 
with  different  system  libraries  to  build  a  binary  customized  for  a  particular  application  in 
a  particular  setting.  Only  those  portions  of  the  system  library  needed  for  that  binding  are 
included.  Mbind  manages  this  final  step. 

The  Maruti  Application  Builder  (mbuild).  Mbuild  automates  the  process  of  building  an 
application  by  generating  for  the  programmer  a  customizable  makefile  that  manages  the 
complete  process  of  compiling,  configuring,  scheduling,  and  binding  an  application. 
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Chapter  1 

Introduction 


MPL  based  on  the  ANSI  C  programming  lai.guage,  with  extensions  to  support  modules  real-Ume 

constructs,  communications  primitives,  and  shared  memory  ooulcs,  real  time 

The  Muniti  Cun^umton  Lansuage  (MCL)  is  used  to  specify  how  individual  program  modules 
v^hich  th€  a^ppli caption  is  to  be  executed. 


General  Program  Organization 

A  complete  Maruti  system  is  called  an  application.  Applications  can  be  large,  distributed  systems 

Sfsubsvlmrr/th^^^'T'  appBcation  is  defined  by  a  configuration  file,  defines  ail 
the  subsj  steins  and  their  interactions.  The  foUowing  entities  make  np  an  application: 

Jobs  entities  in  a  Maruti  application.  Jobs  are  specified  in  the  configuration 

file  ^ith  timing  constraints,  including  the  job  period.  A  job  is  made  up  of  multiple  entry 
points,  which  are  the  threads  of  execution  that  will  be  run  for  the  job. 

whfch^iVnf  tb  f  of  entry  points, 

j  ,  .  1.  j  ^  executed  as  part  of  a  job,  services,  which  define  code 

servicer°  ^  ^  module,  and  functions,  which  are  caUed  from  entries  and 

to  tasks  (a  module  may  be  mapped  to  more  than  one  task).  Each 
tTe  mod^e  P““ts  and  services  of 

^^^“wal^co^n^tio.'  r  P^ths  for  Maruti  applications.  Each  channel  is  a  one- 

and  L  chlui^p?  fi®  f  rness^ges  are  passed.  The  end  points  are  defined  by  out 
nd  m  channel  specifiers,  and  axe  connected  as  specified  in  the  application  configuration  file 
Each  end  point  is  associated  with  one  entry  or  service,  and  its  meLage  type  and  SlnnTtype 

must^match.'^^^  ^  ^  service  header.  The  types  of  the  in  and  out  channel  specifiers 
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Regions  Regions  are  the  mechanism  for  mutual  exclusion  between  Maruti  threads:  only  one  thread 
can  enter  a  particular  region  at  a  time.  Two  types  of  regions  may  be  specified:  global  regions 
enforce  exclusion  for  the  entire  Maruti  application,  while  local  regions  enforce  exclusion  only 
w'ithin  a  single  task. 

Shared  buffers  Named  memory  buffers  can  be  shared  between  tasks.  The  buffer  is  mapped  into 
the  address  space  of  each  task  that  uses  that  buffer. 


1.1.1  Maruti  Programming  Language 

Rather  than  develop  completely  new  programming  languages,  we  have  taken  the  approach  of  using 
existing  languages  as  base  programming  languages  and  augmenting  them  with  Maruti  primitives 
needed  to  provide  real-time  support. 

In  the  current  version,  the  base  programming  language  used  is  ANSI  C.  MPL  adds  modules, 
shared  memory  blocks,  critical  regions,  typed  message  passing,  periodic  functions,  and  message- 
invoked  functions  to  the  C  language.  To  make  analyzing  the  resource  usage  of  programs  feasible, 
certain  C  idioms  are  not  allowed  in  MPL;  in  particular,  recursive  function  calls  are  not  allowed 
nor  are  unbounded  loops  containing  externally  visible  events,  such  as  message  passing  and  critical 
region-transitions. 

•  The  code  of  an  application  is  divided  into  modules.  A  module  is  a  collection  of  procedures, 
functions,  and  local  data  structures.  A  module  forms  an  independently  compiled  unit  and 
may  be  connected  with  other  modules  to  form  a  complete  application.  Each  module  may 
have  an  initialization  function  which  is  invoked  to  initialize  the  module  when  it  is  loaded  into 
memory.  The  initialization  function  may  be  called  with  arguments. 

•  Communication  primitives  send  and  receive  messages  on  one-w-ay,  typed  channels.  There  are 
several  options  for  defining  channel  endpoints  that  specify  what  to  do  on  buffer  overflow  or 
when  no  message  is  in  the  channel.  The  connection  of  two  end-points  is  done  in  the  MCL 
specification  for  the  application — Maruti  insures  that  end-points  are  of  the  same  type  and 
are  connected  properly  at  runtime. 

•  Periodic  functions  define  entry  points  for  execution  in  the  application.  The  MCL  specification 
for  the  application  will  determine  when  these  functions  execute. 

•  Message-invoked  functions,  called  services,  are  executed  whenever  messages  are  received  on 
a  channel. 

•  Shared  memory  blocks  can  be  declared  inside  modules  and  are  connected  together  as  specified 
in  the  MCL  specifications  for  the  application. 

•  Critical  Regions  are  used  to  safely  maintain  data  consistency  between  executing  entities. 
Maruti  ensures  that  no  two  entities  are  scheduled  to  execute  inside  their  critical  regions  at 
the  same  time. 


1.1.  GENERAL  PROGRAM  ORGANIZATION 
1.1.2  Maruti  Configuration  Language 

MPL  Modules  are  brought  together  into  as  an  executable  application  by  a  specification  file  written 
in  the  Maruti  Configuration  Language  (MCL).  The  MCL  specification  determines  the  application’s 
ard  real-time  constraints,  the  allocation  of  tasks,  threads,  and  shared  memory  blocks,  and  all 
message-passing  connections.  MCL  is  an  interpreted  C-like  language  rather  than  a  declarative 
language,  allowing  the  instantiation  of  complicated  subsystems  using  loops  and  subroutines  in  the 
specification.  The  key  features  of  MCL  include: 

•  Tasks,  Threads,  and  Channel  Binding.  Each  module  may  be  instantiated  any  number 
of  times  to  generate  tasks.  The  threads  of  a  task  are  created  by  instantiating  the  entries  and 
services  of  the  corresponding  module.  An  entry  instantiation  also  indicates  the  job  to  which 
the  entry  belongs.  A  service  instantiation  belongs  to  the  job  of  its  cbent.  The  instantiation 
of  a  service  or  entry  requires  binding  the  input  and  output  ports  to  a  channel.  A  channel 
has  a  single  input  port  indicating  the  sender  and  one  or  more  output  ports  indicating  the 
receivers.  The  configuration  language  uses  channel  variables  for  defining  the  channels.  The 

definition  of  a  channel  also  includes  the  type  of  communication  it  supports,  i.e.,  synchronous 
or  asynchronous. 

•  Resources.  All  global  resources  (i.e.,  resources  which  are  visible  outside  a  module)  are 
specified  in  the  configuration  file,  along  with  the  access  restrictions  on  the  resource.  The 
configuration  language  allows  for  binding  of  resources  in  a  module  to  the  global  resources. 

Any  resources  used  by  a  module  which  are  not  mapped  to  a  global  resource  are  considered 
local  to  the  module. 

•  Timing  Requirements  and  Constraints.  These  are  used  to  specify  the  temporal  require¬ 
ments  Md  constraints  of  the  program.  An  application  consists  of  a  set  of  cooperating  jobs. 
A  job  is  a  set  of  entries  (and  the  services  called  by  the  entries)  which  closely  cooperate. 
Associated  with  each  job  are  its  invocation  characteristics,  i.e.,  whether  it  is  periodic  or  ape¬ 
riodic.  For  a  periodic  job,  its  period  and,  optionally,  the  ready  time  and  deadline  within  the 
period  are  specified.  The  constraints  of  a  job  apply  to  all  component  threads.  In  addition 
to  constraints  on  jobs  and  threads,  finer  level  timing  constraints  may  be  specified  on  the 
observable  actions.  An  observable  action  may  be  specified  in  the  code  of  the  program.  For 
any  observable  action,  a  ready  time  and  a  deadline  may  be  specified.  These  are  relative  to 
the  job  arrival.  An  action  may  not  start  executing  before  the  ready  time  and  must  finish 
before  the  deadline.  Each  thread  is  an  implicitly  observable  action,  and  hence  may  have  a 
ready  time  and  a  deadline. 

Apart  from  the  ready  time  and  deadline  constraints,  programs  in  Maruti  can  also  specify 
relaUve  timing  constraints,  those  which  constrain  the  interval  between  two  events.  For  each 
action,  the  start  and  end  of  the  action  mark  the  observable  events.  A  relative  constraint  is 
used  to  constrain  the  temporal  separation  between  two  such  events.  It  may  be  a  relative 
deadline  constraint  which  specifies  the  upper  bound  on  time  between  two  events,  or  a  delay 
constraint  which  specifies  the  lower  bound  on  time  between  the  occurrence  of  the  two  events. 
The  intyval  constraints  are  closer  to  the  event-based  real-time  specifications,  which  constrain 
the  minimum  and/or  maximum  distance  between  two  events  and  allow  for  a  rich  expression 
of  timing  constraints  for  rcaJ-timc  programs. 
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•  Replication  and  Fault  Tolerance.  At  the  application  level,  fault  tolerance  is  achieved  by 
creating  resilient  applications  by  replicating  part,  or  all,  of  the  application.  The  configuration 
language  eases  the  task  of  achieving  fault  tolerance,  by  allowing  mechanisms  to  replicate 
the  modules,  and  services,  thus  achieving  the  desired  amount  of  resiliency.  By  specifying 
allocation  constraints,  a  programmer  can  ensure  that  the  replicated  modules  are  executed  on 
different  partitions. 


Chapter  2 

Tutorial 


2.1  Bcisic  Maruti  Program  Structure 

Maniti  applications  are  built  up  out  of  one  or  more  MPL  modules,  and  tied  together  with  a 
configuration  file  written  in  MCL.  We’ll  start  our  tutorial  with  an  explanation  of  a  very  simple 
application  consisting  of  one  module,  called  simple. mpl.  Our  simple  application  will  contain  a 
producer  thread  that  sends  out  integer  data,  and  a  consumer  thread,  which  receives  integer  values 
and  prints  them  out. 

The  Module 


module  simple; 
int  data; 

maruti _main(int  aorgc,  char  ♦♦argv) 

■c 

ifCargc  <  1)  -C 

printf ("simple :  requires  an  integer  axgument\n") ; 
return  1; 

> 

data  =  atoi (argv [0] ) ; 
return  0; 


This  first  part  of  the  module  will  be  similar  in  all  Maruti  modules.  The  module  always  starts 
with  the  module  name  declaration.  After  the  module  declaration,  the  MPL  module  is  much  like 
any  ANSI  C  program,  but  with  some  special  Maruti  definitions. 

Every  module  must  contain  a  function  named  maxuti_main,  which  initializes  the  module  at  load 
time.  This  initialization  would  normally  include  things  like  device  probing  or  painting  the  screen. 
The  maruti.main  function,  exactly  like  the  main  function  of  a  C  program,  takes  an  argument  count 


CHAPTER  2.  TUTORIAL 


and  list  as  its  parameters,  and  returns  am  error  code  to  its  environment.  In  Maruti,  the  environment 
is  the  system  loader,  and  any  non-zero  return  results  in  a  load  failure,  in  which  case  the  application 
wiD  not  run.  In  our  example,  maruti.main  is  responsible  for  setting  the  initial  value  of  our  datum 
from  the  environment,  and  returning  a  failure  code  if  there  is  no  aurgument. 

Periodic  Functions 


entry  producer () 
out  och :  int ; 

data++;  /♦  produce  data  */ 

sendCoch,  ftdata) ; 

} 


The  producer  is  a  periodic  function,  or  Maruti  entry  point  It  serves  as  the  top-level  function 
for  a  Maruti  thread  that  wiD  be  invoked  repeatedly,  with  a  period  specified  in  the  MCL  config  file 
(which  we  will  see  below). 

The  producer  outputs  its  data  on  a  Maruti  channel,  using  the  builtin  MPL  send  function.  The 
channel  och  is  declared  as  part  of  the  function  header  of  producer.  Maruti  channels  axe  declared  to 
have  a  type,  usually  a  structure  but  in  this  case  a  simple  integer.  All  messages  sent  on  the  channel 
will  be  of  the  same  type. 

Note  that  there  is  no  open,  bind,  or  connect  statement  needed  to  initiate  communication 
on  the  channel.  The  connection  of  the  channel  will  be  specified  in  the  config  file,  and  initiated 
automatically  by  the  runtime  system. 

Message-invoked  Functions 


service  consumer (ich:  int,  msg) 

•c 

printf  ("consumer  got  7,d\n",  *msg) ;  /*  consume  data  */ 

} 


The  consumer  is  a  message-invoked  function,  or  Maruti  service.  It  serves  as  the  top-level 
function  for  a  Maruti  thread  that  is  invoked  whenever  there  is  a  message  delivered  on  the  channel 
declared  in  the  function  header.  The  msg  parameter  is  the  name  of  the  pointer  to  the  message 
buffer  that  will  contain  the  delivered  message. 

Since  the  receipt  of  the  invoking  message  is  automatic  for  a  Maruti  service,  the  only  thing  our 
consumer  has  to  do  is  print  out  the  data  value  contained  in  the  message. 

This  completes  our  simple  module,  but  in  order  to  have  a  Maruti  application,  we  must  have  a 
config  file  that  tells  the  system  how  to  run  our  program. 

The  Config  File 

The  config  file  is  written  in  the  Maruti  Configuration  Language  (MCL),  an  interpreted  C-like 
language  with  constructs  that  allow’  an  application  to  be  built  up  from  pieces  and  interconnected. 
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The  MCL  processor,  caDed  the  integrator,  builds  a  program  graph  from  the  specifications,  analyses 
it  for  type  correctness  and  completeness,  and  checks  for  dependency  cycles.  Here  is  the  config  file, 
simple. cfg,  that  goes  with  our  application: 


application  simple  { 


job  j; 
task  si; 
channel  c; 

init  j :  period  1  s ; 
steirt  si:  simple (27); 

<>  si. producer  <c>  in  j; 
<c>  si. consumer  <>; 


/*  declare  variables  */ 


/♦  specify  job  parameters  */ 
/*  specify  task  parameters  ♦/ 

/♦  producer  thread  */ 

/*  consumer  thread  */ 


The  variables  in  MCL  correspond  to  the  objects  that  make  up  an  application,  such  as  channels, 
tasks,  and  jobs.  As  in  C,  these  variables  must  be  declared  before  they  are  used. 

In  Maruti,  a  job  is  a  logical  collection  of  threads  that  run  with  the  same  period.  AH  entry 
functions  in  the  application  must  be  put  in  some  job.  The  init  statement  sets  the  period  for  a 
particular  job.  In  our  case,  the  job  j  will  run  once  every  second. 

A  task  is  the  runtime  instantiation  of  an  MPL  module,  just  as  in  Unix  a  process  is  the  runtime 
image  of  a  program.  Many  tasks  may  be  executed  from  the  same  module,  each  will  run  indepen¬ 
dently  in  the  Maruti  application.  The  MCL  start  command  instantiates  a  task  from  a  module. 
In  our  example,  we  instantiate  one  task  from  the  module  simple  and  pass  it  the  initial  data  value 
of  27. 

We  instantiate  the  threads  for  the  entry  and  service  functions  inside  a  particular  task,  with 
particular  input  and  output  channels.  In  our  example,  the  statement 

<>  si. producer  <c>  in  j ;  /♦  producer  thread  ♦/ 

instantiates  the  si. producer  thread  in  job  j  with  no  input  channels  and  one  output  channel,  c. 
Likewise,  the  statement 

<c>  si. consumer  <>;  /*  consumer  thread  */ 

instantiates  the  si. consumer  thread  with  one  input  chcinnel  c,  and  no  output  channels.  Service 
functions  are  not  put  in  a  job,  but  rather  inherit  the  scheduling  characteristics  of  the  thread  that 
is  sending  to  their  invoking  channel. 

The  integrator  checks  to  insure  that  the  use  of  producer  and  consumer  in  the  config  file  match 
the  declarations  in  the  program  module. 

Building  and  Running  the  Application 

W^e  can  build  the  simple  application  by  putting  simple .mpl  and  simple. cfg  in  a  directory,  and 
running  the  mbuild  command  there: 
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I  Is 

simple. cfg  simple. mpl 

'/.  mbuild 

mbuild:  extracting  module  info  from  MCL  file  ‘simple. cfg’ 
mbuild:  creating  obj  subdirectory  for  output  files, 
mbuild:  generating  obj /simple-build. mk 
mbuild:  running  make  -f  obj /simple-build. mk 


Mbuild  takes  care  of  running  the  MPL  compiler,  the  MCL  integrator,  as  well  as  the  analysis 
and  binding  programs  needed  to  build  the  runnable  Maruti  application.  By  default,  mbuild  creates 
both  a  stand-alone  binary  that  can  be  booted  on  the  bare  machine,  and  a  Unix  binary  that  runs  in 
virtual  real  time  from  within  the  Unix  development  environment.  These  different  versions  of  the 
runtime  system  are  called  flavors. 

We  can  try  out  the  simple  application  by  running  the  ux+xll  flavor  from  the  command  line: 


•/,  obj /simple. ux+xll 

<...  startup  messages _ > 

consumer  got  28 
consumer  got  29 
consumer  got  30 
consumer  got  31 
consumer  got  32 
constuner '  got  33 
consumer  got  34 
consumer  got  35 
consinner  got  36 
consumer  got  37 

application  quit 


The  application  boots  up  and  outputs  the  consumer  message  once  every  second.  We  can  exit 
the  appDcation  by  typing  ‘q’. 

2.2  Using  the  Graphics  Library 

Many  Maruti  programs  will  want  to  use  the  graphical  screen  as  a  monitor  for  an  embedded  system, 
producing  oscilloscope  or  bar-graph  style  displays,  or  for  animating  a  simulation  or  demonstration. 
Maruti  provides  a  console  graphics  library  as  an  integral  part  of  the  system  to  make  the  development 
of  visually  oriented  applications  simpler.  Our  next  example  application,  clock,  demonstrates  the 
use  of  the  graphics  library  as  weD  as  the  use  of  multiple  jobs  to  take  advantage  of  Maruti ’s  scheduling 
abilities. 
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Figure  2.1:  Display  of  clock  example  application 


The  clock  application  will  display  a  circular  clock  face  on  the  screen,  with  the  hour,  minute, 
and  second  hands  moving  as  independent  Maruti  threads  in  dilferent  jobs.  The  clock  screen  is 
shown  in  Figure  2.1. 

We  win  now  go  through  the  clock. mpl  module  and  see  how  it  works. 


module  clock; 

# in elude  <maruti/mtime.h> 

^include  <maxuti/console.h> 

# in elude  <math.h> 

#include  "clock. h" 

♦define  CENTER.X  (CONSOLE.WIDTH/2)  /*  useful  constants  ♦/ 

♦define  CENTER.Y  (C0NS0LE_HEIGHT/2) 

void  check.for.quitkeyCvoid);  /♦  subroutines  */ 

void  polar.pointCint  pos,  int  radius,  int  *x,  int  ♦y) ; 
void  xor.triangleCint  pos,  int  apex.radius , int  color); 
void  xor_rayCint  pos,  int  color); 


int  sec_pos,  min_pos,  hour.pos; 


/*  system  state  */ 
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The  first  part  of  the  module  is  much  like  any  other  ANSI-C  program,  with  #includes,  #def  ines, 
and  function  prototype  declarations  for  subroutines  to  be  used  later  in  the  program.  Notice  the 
two  Maruti  header  files  included:  <inaruti/mtiine.h>  contains  declarations  related  to  Maruti  time 
management,  and  <marut i/console. h>  contains  declarations  that  define  the  graphics  library  in¬ 
terface.  The  "clock. h"  header,  which  we’ll  see  below,  will  contain  definitions  that  customize  the 
look  of  the  clock  face. 


maruti.mainO 

int  i,  xl,  yl,  x2,  y2,  color; 
char  nua_strC4]; 
mtime  curtime; 
mdate  curdate; 

/*  initialize  screen  library,  paint  screen  black  */ 
cons_graphics_init() ; 

cons_fill_area(0,  0,  CONSDLE.WIDTH,  CONSOLE.HEIGHT ,  BLACK); 


The  inciruti_main  function  in  the  clock  application  draws  the  dock  face  display  and  initializes 
the  system  state  -  in  our  case,  the  positions  of  the  three  dock  hands.  Before  drawing  on  the  screen, 
the  application  must  call  cons_graphics_init,  and  initialize  the  contents  of  the  screen.  The  call 
to  cons_f  ill_area  does  this  by  filling  the  entire  screen  with  the  color  BLACK. 


/*  draw  tick  marks  for  clock  face  ♦/ 
ford  =  0;  i  <  60;  i++)  { 

polar_point(i>fPOS_PER_TICMARK,  OUTER.RADIUS .  fexl.  &yl)  ; 
polar_point(i*POS.PER_TICMARK,  INNER.RADIUS ,  ftx2,  fty2) ; 

if(i  '/,  5)  color  =  GRAY; 
else  color  =  WHITE; 

cons_draw_line(xl,  yl,  x2,  y2,  color); 

> 


The  next  step  in  initialization  is  to  draw  the  tick  mairks  for  the  dock  face.  There  will  be  sixty 
tick  lines  drawn  around  the  drcle,  one  for  each  second.  Every  fifth  tick  mark  will  be  WHITE  to 
mark  the  hour  positions,  and  the  rest  will  be  GRAY.  The  lines  are  drawn  using  the  cons_draw_line 
library  routine,  which  draws  a  one-pixel-wide  line  between  two  points  in  the  desired  color. 

The  location  of  the  endpoints  of  our  tick  marks  are  calculated  using  a  helper  routine,  polar  _point 
(shown  below),  which  calculates  the  cartesian  coordinates  for  a  given  angle  and  radius.  We 
conveniently  adopt  integer  angle  positions  starting  from  0  at  the  top,  clockwise  around  up  to 
60*P0S_PER_TICMARK  back  at  the  top  again. 
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/*  draw  numerals  for  clock  face  */ 

for(i  =  1;  i  <=  12;  i++)  { 

sprintf (num.str,  "Xd",  i); 

polar_point(i*5*P0S.PER_TICMARK,  NUMBER.RADIUS,  txl,  fcyl); 
yl  -=  8;  xl  -=  strlen(num_str)*8  /  2;  /*  center  the  string  */ 

cons.printCxl,  yl,  num.str,  strlen(num.str) ,  YELLOW); 

} 


The  numerals  are  placed  on  the  clock  face  similarly  to  the  tick  marks.  The  cons.print  graphics 
library  function  places  text  on  the  screen  at  a  given  position  and  color. 


/♦  initialize  the  hand  positions  to  current  time  */ 

maruti_get_current_time(4:curtime) ; 
curdate  =  meoniti_time_to.date(curtime) ; 

sec.pos  =  curdate. second  *  POS_PER_TICMARK ; 
min.pos  =  cur date. minute  *  POS_PER_TICMARK  + 

curdate . second  *  POS_PER_TICMARK  /  60  ; 
hour.pos  =  (curdate. hour  */.  12)  *  S  ♦  POS_PER_TICMARK  + 

curdate. minute  *  5  ♦  POS_PER_TICHARK  /  60; 


return  0; 

> 


The  final  part  of  the  initialization  is  the  calculation  of  the  initial  placement  of  the  clock  hands. 
The  maruti_get_current_time  system  call  returns  the  current  system  time,  given  as  a  mtime 
structure.  The  system  time  is  kept  just  as  in  Unix — as  the  number  of  seconds  and  microseconds 
since  the  Epoch  time,  defined  as  00:00  GMT  on  January  1,  1970.  The  maruti_time_to_date  library 
routine  does  the  job  of  calculating  the  date  and  time-of-day  from  an  mtime  value. 


entry  sec_hand() 

•C 

static  int  erase  =  0; 

if (erase)  xor_ray(sec_pos ,  WHITE); 
else  erase  =  1; 

sec.pos  =  (sec_pos  +  P0S_PER_TICHARK)  */,  NUM.POSITIONS; 
xor.ray (sec.pos ,  WHITE) ; 

check.f or_quitkey() ; 

} 
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The  periodic  function  sec.hzmd  will  be  run  once  per  second.  It  erases  the  previously  placed 
second-hand  ray,  calculates  the  new  position  and  draws  again  there.  The  check_for_quitkey 
subroutine  (shown  below)  will  poll  the  keyboard  and  exit  the  application  if  a  key  is  pressed. 


entry  min.handO 

static  int  erase  =  0; 

if(erase)  xor.triangleCinin.pos,  MIN_RADIUS,  HIN.COLOR) ; 
else  erase  =  1; 

min_pos  =  (min.pos  +  1)  */,  NUM_P0SITI0NS ; 
xor_triangle(ain_pos,  HIN.RADIUS,  HIN.COLOR) ; 

} 

entry  hour_hand() 

static  int  erase  =  0; 

if (erase)  xor_triangleChour_pos ,  H0UR_RADIUS,H0UR_C0L0R) ; 
else  erase  =  1; 

hour.pos  =  (hour.pos  +  1)  7,  NUM.POSITIONS; 
xor_triangle(hour_pos,  H0UR_RADIUS,  H0UR_C0L0R) ; 

} 


The  min_hand  and  hour_hand  periodic  functions  update  their  respective  hand  positions  by  one 
each  time  they  are  called.  The  second  hand  jumps  forward  one  second  each  time  it  is  called,  but  the 
minute  and  hour  hands  creep  forward  in  smaller  relative  increments  (rather  than  jumping  forward 
once  per  minute  or  hour,  which  would  not  look  right). 


void  polar _point (int  pos,  int  radius,  int  ♦x,  int  ♦y) 

•C 

double  angle  =  (2.0*H_PI/NUH_POSITIONS)  ♦  (NUM_P0SITI0NS-pos)  +  M_PI/2; 
*x  =  CENTER.!  +  cos (angle)  *  radius; 

*y  -  CENTER.Y  -  sin(angle)  ♦  radius; 

> 


Finally  we  come  to  the  helper  functions.  The  polar.point  function  converts  from  our  conve¬ 
nient  “^positions”  to  real  angles  in  radians,  taking  into  account  that  radians  start  at  the  right  and 
run  counter-clockwise,  whereas  our  positions  start  at  the  top  and  run  clockwise.  Given  an  angle 
in  radians  and  a  radius  from  the  center,  the  x  and  y  coordinates  of  the  point  are  found  by  taking 
the  cosine  and  sine  of  the  angle.  The  final  twist  is  that  in  cartesian  coordinates,  the  y  axis  points 
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up,  whereas  in  screen  coordinates  it  traditionally  points  down,  so  the  y  coordinate  must  be  flipped 
around. 


void  xor_ray(int  pos,  int  color) 
int  X,  y; 

polar .point (pos,  SEC.RADIUS,  tx,  &y) ; 
cons_xor_line(CENTER_X,  CENTER.Y,  x,  y,  color); 


void  xor.triangleCint  pos,  int  apex.radius,  int  color) 

-C 

int  xbl,  ybl,  xb2,  yb2,  xp,  yp; 
int  bpl,  bp2: 

bpl  =  (pos  +  TRIANGLE_BASEL/2)  7.  NUM.POSITIONS; 
bp2  =  (pos  -  TRIANGLE_BASEL/2)  7.  NUM.POSITIONS; 

polar .point (bpl,  TRIANGLE.BASER ,  fcxbl,  ftybl) ; 
polar_point(bp2,  TRIANGLE.BASER,  ftxb2,  &yb2) ; 
polar .point (pos,  apex.radius,  txp,  &yp) ; 

cons_xor.line(xbl,  ybl,  xb2,  yb2,  color);  /♦  base  of  triangle  */ 

cons. xor.line (xbl,  ybl,  xp,  yp,  color);  /*  first  arm  ♦/ 

cons_xor_line(xb2,  yb2,  xp,  yp,  color);  /*  second  arm  ♦/ 


These  graphic  helper  routines  draw  the  line  for  the  second  hand  and  the  triangle  for  the  minute 
and  hour  hands.  The  cons.xor.line  routine  is  similar  to  cons.drav.line,  but  exclusive-or’s 
its  pixels  with  the  screen  rather  than  just  painting  them.  The  xor  technique  is  often  used  in 
graphics  programming  because  it  allows  the  drawing  and  erasing  of  objects  without  disturbing  the 
background.  When  multiple  objects  overlap,  the  overlapping  portions  may  become  a  strange  color 
due  to  xor’ing,  but  you  are  guaranteed  that  when  the  objects  are  erased  by  xor’ing  them  a  second 
time  in  the  same  location,  whatever  color  w’as  there  before  will  be  restored. 


void  check.for.quitkey(void) 
console.event.t  ev; 

if (cons.poll. event (ftev)  0  tk  ev. device  *=  EVENT.KEYBOARD 

&&  ev.keycode  ==  KEY.SPACE) 
qujitO ; 

} 
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The  final  helper  routine  polls  the  console  keyboard  for  events,  and  quits  the  application  if  the 
space  bar  is  pressed.  The  coiis_poll_event  system  call  reports  both  key  press  and  key  release 
events,  and  reports  a  scan-code  rather  than  an  ASCII  value.  This  interface  is  rather  low  level,  but 
allows  the  application  complete  access  to  the  up/down  state  of  every  key  on  the  keyboard. 

This  completes  the  clock. mpl  module.  The  clock. cfg  config  file  follows: 

#include  "clock. h" 
application  clock  { 

job  sec.job;  init  sec_job:  period  SEC.PERIOD  s;  /*  jobs  */ 

job  min.job;  init  inin_job:  period  MIN.PERIOD  s; 

job  hour. job;  init  hour.job:  period  HOUR.PERIOD  s; 

task  ct;  start  ct:  clock;  /♦  task  */ 

<>  ct.sec.hand  <>  in  see. job;  /*  threads  */ 

<>  ct.min.hand  <>  in  nin.job; 

<>  ct.hour.hand  <>  in  hour. job; 

> 

Notice  that  the  config  file  can  include  header  files  just  like  the  MPL  module  can.  This  allows 
the  programmer  to  put  configuration-related  constants  in  one  header  and  use  them  in  both  the 
config  file  and  the  application  modules. 

The  clock  config  simply  creates  one  task,  plus  a  job  for  each  hand  of  the  clock.  The  periods 
are  defined  in  "clock. h": 

tdefine  INNER.RADIUS  235 

#define  OUTER.RADIUS  (INNER_RADIUS+15) 

#define  NUMBER.RADIUS  (OUTER.RADIUS+15) 

♦define  TRIANGLE.BASER  30 
♦define  TRIANGLE.BASEL  50 

♦define  SEC.RADIUS  INNER.RADIUS 

♦define  HIN.COLOR  YELLOW 

♦define  HIN.RADIUS  (SEC_RADIUS-50) 

♦define  HDUR.COLOR  GREEN 

♦define  HOUR.RADIUS  (MIN.RADIUS-50) 

♦define  NUH.POSITIONS  240 

♦define  POS.PER.TICMARK  (NUM.POSITIONS/60) 
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♦define  SEC.PERIOD 
♦define  HIN.PERIOD 
♦define  HOUR.PERIOD 


(60/P0S_PER_TICHARK) 

(3600/5/P0S.PER_TICHARK) 


/♦  jumps  1  tickmark/sec  */ 
/*  creeps  1  tickmark/min  */ 
/*  creeps  5  tickmarks/hr  */ 


First,  a  number  of  constants  describing  the  visual  appearance  of  the  clock  face  are  defined. 
These  can  be  modified  to  taste. 

Second,  the  timing  characteristics  of  the  program  are  given.  The  key  parameter  is  NUM.POSITIONS, 
which  gives  the  number  of  positions  which  the  minute  and  hour  hands  take  around  the  clock  face. 
The  larger  this  number,  the  smaller  the  distance  the  hands  move  each  time,  and  the  more  frequently 
their  jobs  are  executed.  The  minute  hand  must  move  through  all  60  tick  marks  once  every  hour, 
and  the  hour  hand  5  tick  marks  each  hour.  With  NUH.POSITIONS  set  to  240,  each  hand  moves  four 
times  for  each  tick  mark  on  the  face  of  the  clock,  which  works  out  to  one  move  every  15  seconds 
for  the  minute  hand,  and  one  move  every  180  seconds  for  the  hour  hand. 


Chapter  3 

MPL/C  Reference 


Maruti  Programming  Language  (MPL)  is  a  simple  extension  to  ANSI  C  to  support  modules,  syn¬ 
chronization  and  communications  primitives,  and  shared  memory  variables.  MPL  adds  some  re¬ 
strictions  that  enable  analysis  of  the  CPU  and  memory  requirements  of  the  program.  This  chapter 
will  define  the  MPL-specific  features  that  differ  from  ANSI  C. 


3.1'  EBNF  Syntax  Notation 

In  this  manual,  syntax  is  given  in  Extended  Backus-Naur  Formalism  (EBNF).  In  this  notation; 

•  literal  strings  are  quoted,  e.g.  ’module’. 

•  other  terminal  symbols  are  bracketed,  e.g.  <module-name>. 

•  X|Y  denotes  alternatives. 

•  fX)  denotes  zero-or-more. 

•  [X]  denotes  zero-or-one. 

3.2  MPL  Modules 

The  module  is  the  compilation  unit  in  MPL.  It  is  presented  to  the  MPL  compiler  as  one  file,  but 
may  contain  normal  C  #include  directives  so  that  the  parts  of  the  module  can  be  kept  as  distinct 
files.  The  MPL  compDer  generates  a  binary  object  file  for  the  module,  as  well  as  a  partial  EU  graph 
file  for  the  module,  which  contains  information  about  the  module  needed  by  the  Maruti  analysis 
tools. 

At  runtime,  each  MPL  module  is  mapped  to  a  Maruti  task,  which  logically  runs  in  its  own 
address  space.  Communication  between  tasks  is  through  channels  or  shared  blocks.  Each  task  can 
contain  multiple  threads  of  execution,  each  thread  corresponding  to  an  entry  or  service  function  of 
MPL. 

Each  module  starts  with  the  module  name  declaration: 

module.name.spec  : ’module’  <module-name> . 
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3.3  Module  Initialization 

When  the  task  corresponding  to  a  module  is  loaded,  the  Maruti  runtime  system  executes  a  non- 
real-time  initializer  function  provided  by  the  programmer.  The  initializer  is  a  normal  C  function, 
but  it  must  be  present  in  every  module.  It  is  declared  as: 

int  maruti .main (int  ecrgc,  char  **argv) ; 

The  job  of  this  function  is  to  initialize  the  state  of  the  task,  taking  any  parameter  values  into 
account.  If  the  initializer  returns  0,  then  the  task  is  considered  successfully  loaded,  otherwise  the 
load  falls.  The  initializer  thread  can  not  send  or  receive  messages  on  Maruti  channels. 

3.4  Entry  Functions 

Maruti  entry  functions  occur  as  top-level  definitions  in  the  MPL  source  file,  similar  in  syntax  to 
normad  C  function  definitions. 

entry-function  ::=  ’entry’  <entry-name>  ’(’  ’)’  entry-function-body, 
entry-function-body  ::  =  channel-declaration-list  c-funct ion-body . 

Entry  functions  serve  as  the  top-level  function  of  a  Maruti  thread  which  is  invoked  repeatedly 
with  a  period  as  specified  externally,  in  the  MCL  configuration.  Multiple  instances  of  the  entry 
thread  can  be  active  in  a  single  task  at  runtime,  so  care  must  be  taken  to  protect  accesses  to  shared 
data  with  a  region  or  local_region  construct. 

3.5  Service  Functions 

Maruti  service  functions  also  occur  as  top-level  definitions  in  the  MPL  source  file. 

service-fimction  ::=  ’service’  <service-name> 

’ ( ’ <in-channel-naine> ’ : ’ <type_specif ier> ’ , ’ <msg-ptr-name> ’ ) ’ 
service-function-body . 

service-function-body  ::  =  channel-declaration-list  c-f unction-body . 

Services  are  declared  with  the  initiating  channel  and  pointer  to  a  message  buffer.  A  service 
thread  is  invoked  whenever  a  message  on  the  channel  has  been  received,  thus  it  inherits  the  schedul¬ 
ing  characteristics  of  the  sender  to  the  channel.  Multiple  instances  of  the  service  may  be  active  in 
a  single  task  at  the  same  time,  servicing  messages  from  different  senders,  so  care  must  be  taken  to 
protect  accesses  to  shared  data  with  a  region  or  loeal.region  construct. 

The  receipt  of  the  invoking  message  into  private  storage  is  automatic,  and  the  service  function 
is  called  with  a  pointer  to  the  message  buffer.  For  example,  given  the  service  declaration: 

service  consumer (inch:  ch.type,  msg)  {  . . .  } 

The  service  is  actually  invoked  as  if  it  were  a  C  function  declared: 

void  consumer (ch.type  ♦msg)  {  . . .  } 
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3.6  MPL  Channels 

In  Maruti,  channels  are  one-way,  typed,  communications  paths  whose  traffic  patterns  are  analyzed 
and  scheduled  by  the  system.  The  channel  end-points  are  declared  as  part  of  the  entry  or  ser¬ 
vice  functions  which  take  part  in  the  communication.  The  endpoints  are  connected  in  the  MCL 
configuration  for  the  application. 

The  syntax  of  MPL  channels  is  similar  to  a  C  variable  declaration: 

channel-declaration-list  ::=  [  channel-decl  {  channel-decl  }] . 
channel-declaration  ::=  chamnel-type  cheinnel  {  channel  )■ 

channel-type  'out'  |  'in'  |  'in.first'  I  'in.last', 
channel  : :=  <channel-name>  ':'  type-specifier. 

A  channel  endpoint  declaration  will  normally  be  either  an  out  endpoint  or  an  in  endpoint,  used 
in  the  sending  thread  and  receiving  thread,  respectively.  There  are  two  special  variants  of  in  end¬ 
point,  in_f  irst  and  in_last,  which  denote  asynchronous  chetnnels  in  which  the  communications 
will  not  be  scheduled,  and  the  input  buffers  are  allowed  to  overflow.  For  in_f  irst  channels,  the 
first  messages  received  will  be  retained  and  the  rest  dropped,  for  in_last  the  most  recent  messages 
will  be  retained  and  older  messages  overwritten. 

3.7  Communication  Primitives 

The  message  passing  primitives  appear  as  normal  C  function  calls,  but  they  are  built  in  primitives 
of  the  MPL  compiler,  and  their  use  is  recorded  so  that  the  communications  on  the  channel  can  be 
analyzed. 

The  three  primitives  each  take  a  channel  name  and  a  pointer  to  a  message  buffer.  Their 
declarations  would  look  something  like  this: 

void  send  (out  ch.name,  ch_type*  message.ptr) ; 

void  receive  (in  ch.name,  ch.type*  oessage.ptr) ; 
int  optreceive (in  ch.name,  ch_type*  inessage_ptr) ; 

There  are  two  variants  of  the  receive  primitive.  A  normal  receive  is  used  in  most  cases,  and 
it  raises  an  exception  if  there  is  no  message  delivered  at  the  time  it  is  executed.  Normally  the 
Maruti  scheduler  will  arrange  things  so  that  this  is  never  happens.  When  messages  might  not  be 
present  when  the  receiver  is  run,  as  when  threads  are  communicating  asynchronously  with  in_f  irst 
and  in.last  channels,  or  when  the  sender  sometimes  will  not  send  the  message  due  to  run  time 
conditions,  an  optreceive  must  be  used.  The  optreceive  variant  checks  if  a  message  is  present, 
and  receives  it  if  so.  It  returns  1  if  a  message  was  delivered,  or  0  if  no  message  was  delivered. 

3.8  Critical  Regions 

Mutual  exclusion  is  often  necessary  to  prevent  the  corruption  of  data  structures  modified  and 
accessed  by  concurrent  threads.  In  Maruti,  the  region  statement  delineates  a  critical  region. 
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region -statement  : :»=  ( ’region’ | ’local_region’ )  <region-name>  c-statement. 

The  local  region  variant  is  used  within  a  task,  usually  to  serialize  multiple  thread  access  to 
data  structures.  The  region  variant  is  global  to  the  application,  and  is  used  to  serialize  access  to 
shared  buffers  and  other  application-defined  resources,  as  specified  in  the  MCL  configuration  for 
the  application. 

3.9  Shared  Buffers 

Finally,  MPL  adds  shared  buffers  to  the  C  language.  Shared  buffers  declarations  are  similar  in 
syntax  to  typedef  declarations: 

shared-buff er~decl  ’shared’  <type-specif ier>  <shzared-buffer“name> . 

The  shared  buffer  declaration  is  effectively  a  pointer  declaration.  For  example: 

shcired  some_type  shared.buff  er; 

is  treated  as  if  it  were  a  declaration  of  the  form: 

some_type  *shared_buff er  =  &some_buff er; 

The  MPL  specification  for  the  application  determines  which  tasks  share  each  shared  memory 
area.  The  runtime  system  takes  care  of  allocating  memory  for  the  shared  buffers,  and  initializing 
the  buffer  pointers.  The  MPL  program  can  at  all  times  dereference  the  pointer. 

3.10  Restrictions  to  ANSI  C  in  MPL 

The  Maruti  real-time  scheduling  methodology  requires  that  the  tools  be  able  to  analyze  the  control 
flow  and  stack  usage  of  the  MPL  programs,  and  that  synchronization  points  be  well  known.  Thus 
the  foDowing  restrictions  to  .4.NSI  C  must  be  followed  by  the  MPL  programmer: 

•  No  receive  primitives  are  allowed  within  either  loops  or  conditionals. 

•  No  region  construct  are  allowed  within  either  loops  or  conditionals. 

•  No  send  primitive  within  a  loop. 

•  Direct  or  indirect  recursion  is  not  allowed. 

•  Function  calls  via  function  pointers  should  not  be  used. 
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Maruti  Configuration  Language  (MCL)  is  used  to  specify  how  individual  program  modules  are  to 
be  connected  together  to  form  an  application  and  to  specify  the  details  of  the  hardware  platform 
on  which  the  application  is  to  be  executed. 

MCL  is  an  interpreted  C-like  language.  The  MCL  processor  is  called  the  integratoT.  The  integra¬ 
tor  interprets  the  instructions  of  the  MCL  program,  instantiating  and  connecting  the  components 
of  the  application,  checking  for  type  correctness  as  it  goes,  and  outputs  the  application  graph  and 
all  allocation  and  scheduling  constraints  for  further  processing  by  other  Maruti  tools. 

4.1  Top-level  Declarations 

Like  a  C  program,  an  MCL  configuration  file  is  composed  of  a  number  of  top-level  declarations. 
The  C  preprocessor  is  invoked  first,  so  the  configuration  file  may  contain  # include  and  #def  ine 
directives  to  make  the  configuration  very  customizable. 

configuration  -Ctoplevel-declaration}. 

toplevel-declaration  : variable-declaration  |  system-declaration 

block-decleoration  |  application-declaration. 

The  declarations  ma)'  occur  in  any  order — they  do  not  have  to  be  defined  before  used.  The  four 
types  of  top  level  declaration  are  described  in  more  detail  below. 

4.1.1  The  Application  declaration 

application-declaration  ::=  ’application’  <application-name> 

’■C’  {instruction}  ’}’. 

Like  the  main  function  of  C,  the  application  declaration  is  where  the  integrator  will  begin 
execution  of  the  configuration  directives.  Only  one  application  may  be  declared  in  the  configuration. 

4.1.2  The  System  declaration 

system-declaration  : :=  ’system’  <system-name> 

’{’  {node-declaration}  ’}’. 
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node-declaration  *node’  <variable-name>  [’with’  attributes], 
attributes  ::=  attribute  {’,*  attribute}. 

attribute  <symbol>  [’=’  <integer>  I  ’=’  <syiabol>  |  ’*’  <string>] . 

Like  the  application  declaration,  the  system  declaration  can  occur  at  most  once  in  a  configura¬ 
tion.  It  is  not  needed  for  single-node  operation.  The  system  declaration  names  the  nodes  that  an 
application  will  run  on,  and  specifies  attributes  for  them.  For  example: 


system  hdw  ■[ 

node  northstar  with  address  -  "-COxOO.Oxeo.Oxec.OxbljOxfb.Oxce}" ,  master; 
node  raduga  with  address  -  ''•C0x00,0x60,0x8c,0xbl,0xf6,0x67}" ; 

} 


The  integrator  does  not  assign  any  meaning  to  the  attributes  declared  for  the  nodes,  it  just 
passes  the  information  along.  However,  the  Maruti  binder  does  require  the  address  attribute  for 
each  node,  which  specifies  the  node’s  ethernet  address,  and  the  master  attribute  on  only  one  node, 
to  specify  which  node  will  be  the  boot  and  time  master.  The  Maruti/Virtual  environment  further 
requires  that  the  node  <variable-name>  correspond  to  the  hostname  of  the  node  in  the  testbed 
environment. 

4.1.3  Block  declarations 

block-declaration  ::=  ’block*  <block-name>  ’(’  [block-parameters]  ’)’ 

block-par ameter-channels 
*{’  ■[  instruct  ion}  ’}*. 

.4  block  is  something  like  a  function  in  C.  When  a  block  is  declared,  it  may  be  called  by  any 
other  block,  except  that  no  self-recursion  is  allowed.  A  block  can  not  be  declared  inside  another 
block.  A  block  is  called  by  giving  its  name  and  parameters.  There  are  2  kinds  of  parameters: 
classical  parameters  and  channel  parameters. 

block-parameters  ::=  parameter  {  *,’  parameter  }. 
parameter  :  :=  [’var’]  <parameter-name>  [’□’]  [’:’  type]. 

Classical  parameters  are  like  function  parameters  in  C  or  Pascal.  They  can  be  passed  by  value 
or  by  variable  (var  for  variable  passing).  Arrays  may  also  be  be  given  as  var  parameters.  The  type 
of  the  parameter  must  be  given  for  the  first  parameter.  It  may  be  omitted  for  following  parameters: 
the  integrator  will  assume  that  the  parameter  with  no  given  type  has  the  same  type  as  the  previous 
parameter. 

block-parameter-channels  : :=  {  (’in’ 1 ’out’)  channel-names  }. 
channel-names  :;=  channel  •[  channel  }. 
channel  ::=  <channel-name>  [  ’[’  <integer>  ’] ’  ]. 
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The  channel  parameters  decribe  the  inputs  and  outputs  of  the  block.  The  in  and  out  keywords 
do  not  have  exactly  the  same  meaning  has  in  MPL:  they  only  show  which  channels  are  connected 
at  the  left  and  which  are  connected  at  the  right  of  the  block  call  (see  connection  below).  The 
communication  type  of  the  channel  (in.first,  in.last,  or  synchronous)  and  the  type  of  the 
messages  on  the  channel  are  determined  by  the  connections  of  the  channels  to  the  tasks. 

When  there  is  an  array  of  channel  pjirameters,  the  connections  will  occur  in  ascending  order. 
For  example: 


block  foo() 
in  ch  [3]  ; 

<  ...  > 

application  bax  { 
channel  a [3] ; 

<a[0..23>  fooO  <>;  /*  a[0]->ch[0],  a[l]->ch[23,  a[2]->chC2]  */ 


> 


4.1.4  Variable  declarations 

variable-declaration  ::=  type  variable-names 

type  ’float’  1  ’inf  |  ’string’  |  ’time’  |  ’channel’ 

I  ’task’  I  ’job’  I  ’node’  1  ’shcired’  |  ’region’. 

vaoriable-names  ::=  variable  ■[  ’,’  variable  }. 

variable  <variable-name>  [’ [’  <integer>  ’]’]  [’ [’  <integer>  . 

Variables  may  be  declared  globally  at  the  top-level,  or  locally  in  a  block.  Global  variables 
can  be  accessed  in  all  blocks,  while  local  variables  can  only  be  accessed  in  the  block  where  they 
are  declared.  A  local  variable  (or  a  parameter)  may  be  declared  with  the  same  name  as  a  global 
variable.  In  this  case  only  the  local  variable  (or  the  parameter)  can  be  accessed  in  the  block. 

The  order  of  the  variable  declarations  does  not  matter.  For  example: 


block  foo() 

i  =  4s  +  5mn;  /*  correct  */ 

time  i; 

> 


Arrays  may  be  declared.  As  in  C,  the  array  indicies  are  numbered  from  0  to  size-of-array  less 
1.  Arrays  of  1  or  2  dimensions  are  accepted.  For  example: 
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block  foo() 

■c 

string  s [10] ; 

s[5]  =  "a  string";  /*  correct  */ 

sCO]  =  s[5]  +  "  foo";  /•  correct  */ 

s[10]  =  /♦  incorrect:  out  of  array  limits  */ 

y 


4.2  Instructions 


The  MCL  integrator  interprets  a  number  of  instructions  that  express  the  way  an  application  is  to 
be  built  up  from  components.  The  different  instructions  are  explained  below. 


instruction 


::=  variable-declaration 
I  task-initialization 
i  job-initialization 
1  connect-declaration 
I  link-intruction 
I  allocation-instruction 
I  expression  ’ ; * 

I  print-instruction 
I  compound- instruct ion 
I  {instruction}  ’}’. 


4.2.1  Compound  instructions 

compound-instruction 

’if’  ’(’  test-expression  ’)’  intruction 
I  ’if’  ’(’  test-expression  ’)’  intruction  ’else’  instruction 
I  ’do’  instruction  ’vhile’  ’(’  test-expression  ’)’  ’;’ 

I  ’while’  ’(’  test-expression  *)’  instruction 
I  ’for’  ’(’  expression  test-expression  ’;’  expression  ’)’ 

‘  instruction, 

test-expression  expression. 

The  meaning  of  these  constructs  is  the  same  as  in  the  C  language.  The  test-expression 
should  evaluate  to  an  integer,  where  0  means  false,  and  all  other  values  mean  true. 


4.2.2  Tasks 

task-initialization  ’start’  names  <module-name>  [module-parms] 

[instantiation]  [task-allocation]  ’ ; ’ 

module-parms  : ’(’  [module-parameter-list]  ’)’. 
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module-parameter-list  expression  expression}. 

instantiation  'with*  <syinbol>  constant 

<symbol>  *=*  constant  }. 

task- allocation  ’on*  expression. 


A  variable  of  type  task  must  be  initialized  before  it  Ccm  be  used.  This  initialization  consists  of 
giving  the  name  of  a  module:  the  task  will  be  an  instantiation  of  this  module.  Module  parameters 
may  be  given:  after  evaluation  they  will  be  given  to  the  initializer  thread  of  the  module.  The 
initializations  during  the  loading  of  an  application  will  take  place  in  exactly  the  same  order  as  thay 
are  found  by  the  intergrator  during  the  execution  of  the  configuration. 

AH  the  shared  buffers  and  the  global  regions  of  the  module  must  be  instantiated  using  the  with 
clause:  the  corresponding  shared  or  region  variables  must  be  given. 

The  on  clause  may  be  used  to  force  allocation  of  the  task  on  a  particular  node. 

4.2.3  Job  initialization 

job-initialization  ::=  ’init’  names  timing-job 
timing-job  : :=  {  ’period’  expression  }. 

A  variable  of  type  job  must  be  initialized  before  it  can  be  used.  The  job  will  refer  to  a  collection 
of  threads  with  the  same  period. 

4.2.4  Connections 

connect-declaration  : :=  chan-list  connect-ncune  chan-list 

[in-job]  {timing-service}  [task-alloc] ’ ; ’ . 
chan-list  : ’<’  [names! 

connect-name  <task-name>  [’ [^expression’] ^3  <routine-name> 

1  <block-name>  [expression  ’  expression}] 
in- job  ::=  ’in’  constant. 

timing- service  (  ’ready’  expression  1  ’deadline’  expression  ). 

There  are  two  types  of  connections:  routine  connections  and  block  connections.  In  both  cases 
the  inputs  are  connected  (or  mapped)  to  the  channels  declared  at  the  left  of  the  connection  and 
the  outputs  at  the  right.  The  number  of  input  (or  ouput)  channels  must  be  the  same  as  in  the 
definition  of  the  routine  (or  the  block).  The  mapping  is  done  following  the  order  of  this  definition. 

In  a  routine  connection  the  inputs  and  ouputs  of  an  entry  or  a  service  of  a  task  are  connected  to 
channels.  This  connection  creates  a  new  instance  of  a  service  if  the  routine  was  a  service,  otherwise 
it  creates  the  only  instance  of  an  entry.  An  entry  can  not  be  connected  many  times. 

For  an  entry  connection  a  job  name  must  be  given,  the  entry  will  be  a  part  of  this  job.  For  a 
service,  a  job  can  not  be  declared:  the  job  of  the  service  is  implicitly  given  by  the  connection:  the 
first  input  channel  of  a  service  is  the  triggering  channel  of  the  service.  The  job  of  the  service  is  the 
same  as  the  job  of  the  origine  of  the  triggering  channel. 

A  timing  characterization  may  only  be  given  to  a  routine  connection. 
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In  a  block  connection  the  input  and  output  channels  of  the  block  are  mapped  to 
channels.  A  mapping  is  also  done  for  all  the  block  parameters,  following  the  order  in 
definition.  The  number  of  parameters  must  be  the  same  as  in  this  definition,  and  all 
must  be  coherent. 

4.2.5  Allocation  Instructions 

allocation- instruct ion  ::=  ’separate’  ’(’  names  ’)’  ’;’ 

I  ’together’  ’(’  names  ’)’ 

A  separate  instruction  is  a  command  to  the  allocator  to  keep  the  tasks  on  different  nodes  in 
the  final  system.  A  together  instruction  specifies  that  all  tasks  must  be  allocated  to  the  same 
node. 

4.2.6  Link  Instruction 

link-intruction  ::=  ’link’  expression  ’to’  expression 

In  a  few  cases  the  connections  are  not  sufficient  to  describe  a  communication  graph  with  the 
structure  of  the  blocks.  In  these  cases  a  link  instruction  may  be  used. 

A  link  between  two  channels  means  that  the  two  channels  are  the  same. 

Example:  if  we  want  to  connect  directly  an  input  and  an  output  channel  of  a  block  a  link  must 
be  used. 


the  given 
the  block 
the  types 


block  foo() 
in  in. channel; 
out  out.channel; 

-C 

link  in.channel  to  out.channel; 

} 


4.2.7  Print  Instruction 

print- instruct ion  ’print  (’  expression  expression}  ’);’. 

The  print  instruction  outputs  messages  to  the  standard  output  during  integration.  This  in¬ 
struction  can  be  used  for  the  debugging  of  a  configuration  file.  Any  string,  number,  or  time  may 
be  printed.  A  newline  is  added  at  the  end. 

4.3  Expressions 

Expressions  in  MCL  are  very  similar  to  C  expressions: 

expression  ::=  expression  ’=’  expression 
I  expression  ’ll’  expression 
1  expression  ’&&’  expression 
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I  expression  expression 

I  expression  *!=*  expression 
I  expression  expression 
i  expression  O’  expression 
I  expression  ’ <=’  expression 
I  expression  expression 

I  expression  C'dM  'mnM ’s’ pmsM ’us') 

1  expression  ’+’  expression 
I  expression  ’-’  expression 
I  expression  ’*’  expression 
I  expression  ’/’  expression 
I  expression  ’7,’  expression 
I  ’ ! ’  expression 
I  ’(’  expression  ’)’ 

I  expression  ’++’ 

1  expression  ’ — ’ 

I  constant 

constant  <syinbol>  [  ’  expression  ’3’  3  C  ’[’  expression  ’3’  3 

I  <symbol>  ’[’  expression  expression  ’3’  [  ’[’  expression  ’3’  3 
1  <syinbol>  expression  ’3’  ’  C’  expression  expression  ’3’ 

1  <integer> 

I  <double> 

I  <string> . 

In  addition  to  the  usual  C  expressions,  MCL  supports  time  unit  expressions,  for  example, 
‘3  s  +  500  ms'  is  a  time  expression  that  evaluates  to  3.5  seconds. 

Also,  MCL  supports  array  range  notation  as  a  shorthand  for  lists.  For  example,  the  expression 
c  [2.  .43  is  shorthand  for  c[23  ,  c[33  i  c[43  This  notation  is  most  often  used  for  paissing  arravs 
of  channel  values  to  blocks  or  in  connection  instructions. 


Chapter  5 

Maruti  Runtime  System  Reference 


The  Maruti  runtime  system  is  bound  together  with  the  application  binary  files  by  the  mbind  utility. 
Only  those  parts  of  the  runtime  needed  by  the  application  are  linked  in.  There  are  several  versions 
of  the  runtime  system  available  depending  on  the  environment  in  which  the  application  will  be  run. 
For  example,  there  are  two  different  versions  of  the  core  library;  a  stand-alone  version  that  can 
boot  directly  on  bare  hardware,  and  a  Unix  version  that  runs  as  a  user-level  process  under  Unix, 
providing  virtual- time  execution  and  access  to  debugging  tools. 

The  set  of  library  versions  that  an  application  links  with  are  called  flavors.  Flavors  are  specified 
by  the  programmer  as  strings  of  library  names  separated  by  a  for  example,  ‘ux+xll\ 


5.1  Core  Library  Reference 

^include  <maruti /maruti- cor e.h> 

The  Maruti  core  library  implements  the  scheduling,  thread  and  memory  management,  and 
network  communication  subsystem.  It  provides  primitives  for  applications  to  send  and  receive 
messages,  insert  preemption  points,  manipulate  the  schedule  (via  calendars),  and  do  time  and  date 
calculations.  There  are  currently  two  flavors  of  the  core  library: 

•  sa  -  The  Maruti/Standalone  core  library.  Applications  linked  with  this  flavor  can  be  booted 
directly  (by  the  NetBSD  boot  blocks).  It  includes  the  distributed  operation  support,  based 
on  the  3Com  3c507  Etherlink/16  adapter. 

•  ux  -  The  Maruti/Virtual  debugging  core  library.  Applications  linked  with  this  flavor  are 
run  as  normal  Unix  processes  from  the  NetBSD  command  line.  It  includes  a  virtusd-time 
scheduler  and  debugging  monitor  (described  below)  and  implements  distributed  operation 
using  normal  Unix  TCP/IP  networking  facilities. 

5.1.1  MPL  Built-in  Primitives 
void  iDaruti_eu(void) 

The  maruti.eu  primitive  inserts  a  Maruti  EU  break  into  the  program  at  the  location  of  the 
call.  It  is  not  normally  used  explicitly  in  an  application,  as  the  system  tools  put  EU  breaks 
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where  necessarj^  for  synchronization.  It  is  useful,  however,  for  breaking  up  long-running  EUs  -  the 
mziruti.GU  then  serves  as  a  possible  preemption  point. 

void  sendCout  ch^name,  ch.type*  message.ptr) 
void  receiveCin  ch.name,  ch.type*  message^ptr) 
int  optreceiveCin  ch.name,  ch.type*  message.ptr) 

The  communications  primitives  are  documented  in  section  3.7  in  the  MCL  Reference  Chapter. 

5.1.2  Calendar  Switching 

int  maruti.calendar.activateCint  calendar.nuin,  mtime  svitch^time,  mtime  off set^time) 
void  maruti^calendar^deactivateCint  calendar.num,  mtiiae  switch.time) 

Maruti  calendars  may  be  activated  and  deactivated  {switched  on  or  off)  at  any  time.  The 
switch^time  is  the  time  at  which  the  de/activation  should  take  place.  The  switch  can  occur  at 
any  point  in  the  future,  and  the  switch  requests  can  come  out  of  order  with  respect  to  the  switch 
time.  Requests  with  the  same  switch  time  are  executed  in  the  order  of  the  requests. 

Calendars  can  be  activated  with  a  particular  off set^time,  which  is  the  relative  position  within 
the  calendar  to  start  executing  at  the  switch  time.  The  offset  time  will  normally  be  zero,  but  can 
be  any  relative  time  up  to  the  1cm  time  of  the  calendar. 

The  runtime  system  does  not  check  the  feasibility  of  the  combined  schedules  represented  by  the 
calendars  -  that  should  be  done  offline. 

5.1.3  Calendar  Modification 

void  maruti.calendar.createCint  calendar^num,  int  nm.entries ,  mtime  Icm.time) 
void  maruti_calendar_delete(int  calendax.num) 

Calendars  are  normally  created  offline  and  compiled  into  the  Maruti  application,  but  it  is 
possible  to  create  new  calendars  at  runtime.  The  application  is  responsible  for  insuring  that  the 
generated  schedules  are  feasible. 

When  a  calendar  is  created,  the  maximum  number  of  entries  it  wiU  contain  must  be  specified, 
as  well  as  the  Icm.time,  which  is  the  period  of  the  calendar  as  a  whole.  At  the  end  of  its  period, 
the  calendar  will  wrap  around  and  begin  executing  from  the  beginning  again. 

typedef  struct  calendar's  { 
entry.t  *entries; 
int  num. entries; 
mtime  Icm^time; 
mtime  base^time; 
entry  _t  *cxir; 

<. .  .> 

}  calendar.t; 

typedef  struct  { 

int  eu.thread,  eu.id; 


/*  cur-entries  is  the  current  offset  ♦/ 
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Btime  eu.start,  eu.deadline; 
int  eu.type; 

#  define  EU.EMPTY  0  /*  empty  EU  slot  ♦/ 

#  define  EU.PERIODIC  1  /*  periodic  EU  ♦/ 

<. .  .> 

}  entry.t; 

void  maruti_calendar_get_header(int  calendar.num,  calendar.t  *calend2Lrp) 

void  maruti_calendar_get_entry(int  calendar _num ,  int  entry _num,  entry_t  *entryp) 

void  maruti_calend£G:_set_entry (int  calendar_num ,  int  entry.num,  entry_t  entry) 

The  maruti_calendar_set_entry  caJJ  is  used  to  populate  new  calendars.  It  can  overwrite 
any  entry  in  any  inactive  calendar.  The  entry  eu_start  and  eu_deadline  times  are  the  earliest 
start  time  and  latest  end  time,  respectively.'  The  eu_id  serves  to  identify  the  eu  when  tracing  or 
reporting  timing  results. 

The  maruti_calendar_get_header  and  maruti_calendar_get_entry  calls  can  be  used  to 
query  the  contents  of  a  calendar.  These  are  useful  when  ‘cloning’  an  existing  calendar  into  a  new 
calendar,  perhaps  with  modifications. 

5.1.4  Date  and  Time  Manipulation 
#include  <maruti/mtime.h> 

The  Maruti  core  library  provides  routines  and  macros  for  simple  time  and  date  calculations. 

typedef  struct  { 
long  seconds; 
long  microseconds; 
y  mtime; 

♦define  time_cmp(a,b)  /♦  like  strcmp,  0  if  eq,  It  0  if  a  <  b,  etc  */ 

♦define  time_add(a,b)  /*  a  +=  b  ♦/ 

♦define  time_sub(a,b)  /*  a  -=  b  */ 

♦define  time_add_scalar(t,  s)  /♦  t  +=  s  (s  is  an  int,  in  microseconds)  ♦/ 

♦define  time_sub_scalar(t,  s)  /•  t  -=  s  (s  is  an  int,  in  microseconds)  */ 

♦define  time_mul_scaleLr(t ,  s)  /*  t  *=  s  (s  is  an  int)"  */ 

♦define  time_div_scalax(t ,  s)  /♦  t  /=  s  (s  is  an  int)  */ 

The  mtime  type  is  the  basic  Maruti  time  structure.  A  number  of  convenience  macros  for  arith¬ 
metic  on  mtime  values  are  provided.  Two  mtime  values  may  be  compared,  added,  or  subtracted. 
In  addition,  an  integer  time  in  microseconds  may  be  added  to  and  subtracted  from  an  mtime  value, 
and  mtime  values  may  be  multiplied  or  divided  by  integer  scaling  factors. 

Note:  The  microseconds  field  is  always  in  the  range  0  to  999999,  and  the  time  represented 
by  an  mtime  value  is  always  the  number  of  seconds  plus  the  number  of  microseconds.  These  rules 
hold  even  for  negative  mtime  values,  which  can  arise  when  subtracting  mtimes.  Thus  the  mtime 
representation  for  the  time  —1.3  seconds  is  {  “2,  700000  }•. 
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void  m2a~uti_get_current_tine(intime  •curtime) 

The  current  system  time  is  returned  by  maruti.gGt.current.tinie.  Maruti,  like  Unix,  repre¬ 
sents  absolute  time  as  the  number  of  seconds  and  microseconds  since  the  Epoch  time,  defined  as 
00:00  GMT  on  January  1,  1970. 

typedef  struct  { 


short  yeaar; 

1* 

year  -  1900  ♦/ 

short  month; 

/* 

month  (0..11)  */ 

short  wday; 

i* 

day  of  week  (0..6)  */ 

short  mday; 

f* 

day  of  month  (1..31)  ♦/ 

short  yday; 

1* 

day  of  year  (0..365)  */ 

short  second,  minute; 

1* 

0..59  */ 

short  hour; 

1* 

0..23  ♦/ 

int  microsecond; 

1* 

0.. 999999  */ 

}  mdate; 

mtime  niaruti_date_to_time (mdate  d) 
mdate  maruti_time_to_date(mtime  t) 

mtime  maruti_gmtdate_to_time (mdate  d) 
mdate  maruti_time_to_gmtdate (mtime  t) 

int  maruti_set_gmtoff (int  gmtoff) 
int  meiruti.get.gmtoff  (int  *gmtoffp) 

Applications  will  often  want  to  view  the  time  as  something  more  convenient  than  the  number 
of  seconds  since  the  Epoch.  The  Maruti  mdate  type  denotes  a  time  expressed  as  a  date  plus 
a  time  of  day.  The  functions  maruti_time_to_gmtdate  and  maruti_gmtdate_to_time  convert 
between  mtime  and  mdate  values  using  the  GMT  timezone.  The  functions  maruti_time_to_date 
and  maruti_date_to_time  convert  using  the  local  offset  from  GMT. 

The  local  timezone  used  in  these  conversions  is  initially  set  by  the  runtime  system,  but  may 
be  chcinged  by  the  application.  The  timezone  is  expressed  as  an  offset  from  GMT  in  seconds.  For 
example  the  U.S.  timezone  EST  is  5  hours  behind  GMT,  or  -18000  seconds  offset. 

Note:  Maruti  does  not  at  this  time  attempt  to  handle  leap  seconds  or  automatically  switching 
the  local  timezone  to  account  for  daylight  savings  times.  The  cost  of  providing  these  features  in 
code  and  table  space  was  deemed  prohibitive. 

5.1.5  Miscellaneous  Functions 
void  quit (int  exitcode) 

The  quit  call  terminates  the  application.  The  exit  code  is  not  usually  relevant  in  an  embedded 
system,  but  will  be  returned  to  the  environment  where  that  makes  sense  (such  as  in  the  Unix 
debugging  environment). 
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5.2  Console  Library  Reference 

♦include  <inaruti/console.h> 

The  Maruti  console  graphics  library  provides  access  to  the  console  device,  including  the  keyboard 
and  speaker,  but  most  importantly  the  graphical  display.  The  graphics  library  includes  support 
for  placing  text  anywhere  on  the  screen,  simple  2d  geometry  primitives  suitable  for  generating  line 
and  bar  graphs,  and  includes  optimized  routines  for  moving  bitmaps  without  flicker,  for  animated 
simulations.  There  are  currently  three  flavors  of  the  graphics  library  implemented: 

•  et4k  -  This  flavor  supports  Super  VGA  graphics  cards  based  on  the  Tseng  Labs  ET4000  chip 
and  its  accelerated  descendents,  like  the  ET4000/W32.  The  et4k  flavor  runs  the  screen  at  a 
resolution  of  1024x768  in  256  color  mode. 

•  vgal6  -  This  flavor  supports  all  standard  VGA  graphics  cards,  running  the  screen  at  a 
resolution  of  640x480,  in  16  color  banked  mode. 

•  xll  -  This  flavor  works  with  the  ux  core  flavor,  displaying  the  Maruti  screen  in  an  Xll 
window  under  Unix. 

5.2.1  Screen  Colors 

The  Maruti  console  graphics  library  supports  the  following  colors,  defined  in  <ma2ruti/console.h>: 


♦define 

BLACK 

0 

♦define 

DARK.BLUE 

1 

♦define 

DARK.GREEN 

2 

♦define 

DARK.CYAN 

3 

♦define 

DARK.RED 

4 

♦define 

DARK. VIOLET 

5 

♦define 

DARK.YELLOW 

6 

♦define 

DARK.WHITE 

7 

♦define 

BROWN 

8 

♦define 

BLUE 

9 

♦define 

GREEN 

10 

♦define 

CYAN 

11 

♦define 

RED 

12 

♦define 

VIOLET 

13 

♦define 

YELLOW 

14 

♦define 

WHITE 

15 

/*  aliases  */ 

♦define 

GREY 

DARK.WHITE 

♦define 

GRAY 

DARK.WHITE 

The  maximum  screen  size  supported  is  also  defined: 

♦define  CONSOLE.WIDTH  1024 
♦define  CONSOLE.HEIGHT  768 
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5.2.2  Graphics  Functions 
void  cons _graphics_init (void) 

The  cons_graphics_init  function  must  be  called  before  any  other  graphics  functions,  usually 
from  the  maruti.main  function  of  the  application’s  screen  driver  task. 

void  cons_fill_area(int  x,  int  j,  int  width,  int  height,  int  color) 
void  cons_xor_area(int  x,  int  y,  int  width,  int  height,  int  color) 

These  functions  paint  an  area  of  the  screen,  specified  by  its  upper-left  coordinates  (x,  y),  and 
its  width  and  height,  in  the  given  color.  The  cons_f  ill_area  variant  overwrites  the  previous 
contents  of  that  area  of  the  screen,  while  cons.xor.area  exclusive-or’s  the  screen  contents  with 
the  specified  color. 

function  cons_draw_pixGl(int  x,  int  y,  int  color) 
function  cons_xor_pixel(int  x,  int  y,  int  color) 

These  functions  draw  and  xor,  respectively,  a  single  pixel  at  (x,  y)  in  the  specified  color. 

void  cons_draw_line(int  xl,  int  yl,  int  x2,  int  y2,  int  color) 
void  cons_xor_line(int  xl,  int  yl,  int  x2,  int  y2,  int  color) 

These  functions  draw  and  xor,  respectively,  a  single-pixel  width  line  from  coordinates  (xl,  yl) 
to  (x2,  y2)  in  the  specified  color. 

void  cons .draw .bitmap (int  x,  int  y,  int  width,  int  height, 

void  ^bitmap,  int  color) 

void  cons.xor_bitmap(int  x,  int  y,  int  width,  int  height, 

void  *bitmap,  int  color) 

These  functions  draw  and  xor,  respectively,  a  width- by-height  sized  bitmap  onto  the  screen 
in  the  specified  color,  with  its  upper-left  corner  at  (x,  y).  The  bitmap  is  in  standard  X  bitmap 
format,  with  eight  pixels  per  byte,  and  an  even  multiple  of  aght  pixels  per  scan  line. 

void  cons .move.bitmap (int  xl,  int  yl,  int  x2,  int  y2,  int  width,  int  height, 

'  void  *bitmap,  int  color) 

void  cons.xor .move.bitmap (int  xl,  int  yl,  int  x2,  int  y2,  int  width,  int  height, 

void  ♦bitmap,  int  color) 

These  functions  optimize  the  erasing  and  redrawing  of  a  bitmap  by  combining  the  operations 
into  one  loop,  modifying  one  scan-line  at  a  time.  This  optimization  eliminates  the  flicker  that  can 
occur  when  erasing  the  entire  bitmap  then  redrawing  it,  making  animations  more  effective. 

The  call  cons.move.bitmap(xl,yl,x2,y2,w,h,b,c)  is  equivalent  to  the  sequence: 

cons.draw .bitmap (xl , x2 , w ,h ,b ,BLACK) ; 
cons.draw.bitmap(x2,y2,w,h,b,c) ; 
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The  call  cons_xor_aove_bitiiiap(xl,yl,x2,y2,s,h,b,c)  is  equivalent  to  the  sequence: 

cons_xor_bitmap(xl,yl,w,h,b,c); 
cons_xor_bitmap (x2 . y2 , w ,h , b , c) ; 

void  cons.putsCint  x,  int  y,  int  color,  char  ♦string) 
void  cons.xor.putsCint  x,  int  y,  int  color,  char  ♦string) 

These  functions  draw  and  xor,  respectively,  a  text  string  at  (x,  y)  in  the  specified  color. 

5.2.3  Keyboard  and  Speaker  functions 

typedef  struct 

{ 

unsigned  char  device;  /♦  just  keyboard  works  for  now  ♦/ 

#  define  EVENT.OTHER  0 

#  define  EVENT.KEYBOARD  2 

unsigned  char  key code; 

)•  console_event_t; 

int  cons_poll_event(console_event_t  ♦event) 

The  cons.poll.event  call  returns  1  if  a  console  event  has  occurred,  0  otherwise.  There  there  is 
a  pending  console  event,  the  event  structure  is  fiUed  in.  The  device  field  is  set  to  EVENT.KEYBOARD 
and  the  keycode  field  is  set  to  the  scan  code  of  the  key  that  was  pressed.  The  list  of  scan  codes  is 
in  <maruti/keycodes.h>. 

void  cons_start_beep(int  pitch) 
void  cons_stop_beep (void) 

The  console  speaker  can  be  turned  on  and  off  with  these  functions.  The  cons.start.beep  call 
programs  the  speaker  to  sound  at  a  particular  frequency,  in  hertz,  and  cons  stop  beep  turns  it 
off. 

5.3  Maruti/Virtual  Monitor 

The  ux  flavor  of  the  Maruti  core  library  includes  some  basic  debugging  facUities  called  the  Maruti 
monitor.  While  an  application  compiled  with  ux  is  running,  aspects  of  its  execution  can  be  con¬ 
trolled  from  the  Unix  tty  (which  will  be  distinct  from  the  console  keyboard  device).  The  monitor 
provides  the  following  facilities: 

•  Tracing  scheduler  actions.  The  user  can  independently  toggle  the  tracing  of  elemental 
unit  executions,  calendar  wrap-around  events,  and  calendar-switch  events. 

•  Single-stepping  calendars  or  elemental  units.  The  user  can  toggle  single  stepping 
through  each  elemental  unit  execution,  or  a  whole  calendar’s  execution. 
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•  Controlling  virtual-time  execution  speed.  The  user  can  control  the  speed  of  the  ap¬ 
plication  in  two  ways.  First,  the  user  can  toggle  as-soon-as-possible  execution  of  elemental 
units,  called  asap  mode.  Second,  the  user  can  set  the  speed  at  which  virtual  time  advances 
relative  to  real  clock  time. 

•  Both  single-keystroke  and  command-line  operation.  All  monitor  switches  may  be 
toggled  with  a  single  keystroke  while  the  application  continues  running.  Also,  the  user  can 
enter  a  command-line  mode  in  which  various  parts  of  the  system  state  may  be  queried  and 
modified. 

5.3.1  Controlling  Virtual  Time 

The  Maruti  monitor  contains  a  user-settable  speed  variable  which  determines  the  rate  at  which 
virtual  time  advances  relative  to  the  actual  clock  time. 

The  speed  may  be  set  to  any  floating  point  value  greater  than  zero.  Thus  virtual  speed  may  be 
set  to  run,  for  example,  five  times  faster  than  clock  time  (speed  =  5)  or  at  four  times  slower  (speed 
=  0.25).  The  speed  is  logically  limited  on  the  side  by  the  utilization  of  the  CPU.  The  execution  of 
application  code  can  not  be  sped  up,  only  the  idle  time  between  executions. 

Idle  time  can  be  eliminated  completely  by  turning  on  as-soon-as-possible  scheduling  of  elemental 
unit  (asap-mode).  In  asap-mode  the  virtual  time  is  advanced  to  the  start  time  of  the  next  elemental 
unit  as  soon  as  the  previous  one  completes,  resulting  in  the  execution  of  all  EUs  in  immediate 
succession.  Asap-mode  is  separate  from  the  speed  variable  —  it  can  be  toggled  independently,  and 
when  turned  off,  scheduling  continues  at  the  previously  set  speed. 

5.3.2  Single-Keystroke  Operation 

The  following  keys  are  active  from  the  Unix  tty  session  (not  the  console  keyboard)  while  the 
application  is  running: 

?  shows  the  list  of  keystrokes  and  current  values  for  the  toggle  switches, 
a  toggle  as-soon-as-possible  mode, 
e  toggle  elemental  unit  tracing, 
c  toggle  calendar  tracing. 

X  toggle  calendar-switch  tracing, 
s  toggle  elemental  unit  single-stepping. 

S  toggle  calendar  single-stepping, 
q  quit  application  completely. 

<ESC>  stop  application  and  enter  command-line  mode. 
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5.3.3  Command-line  Operation 

The  foUowing  commands  are  available  from  command  Hne  mode.  At  this  time,  command-line  mode 
is  a  just  a  framework  with  just  a  few  commands.  More  commands  to  query  and  set  the  system 
state  are  envisioned  for  future  releases. 

help 

Get  a  list  of  command-line  mode  commands, 
quit 

Quit  the  application  completely, 
vars 

Show  all  user-settable  monitor  variables  and  their  values, 
speed  <value> 

Set  the  virtual-time  speed  to  value.  The  value  can  be  any  floating  point  value  greater  than  zero, 
cstep  [on  I  off] 

Set  or  toggle  calendar  single- stepping. 

estep  [on  1  off] 

Set  or  toggle  eu  single-stepping. 

ctrace  [on  I  off] 

Set  or  toggle  calendar  tracing. 

etrace  [on  I  off] 

Set  or  toggle  eu  tracing. 

strace  [on  I  off] 

Set  or  toggle  calendar  switch  tracing. 


Chapter  6 


Maruti  Tools  Reference 


6.1  Maruti  Builder 

The  mbuild  program  automates  the  process  of  building  a  runnable  Maruti  application.  This  in¬ 
volves  building  the  constituent  application  binaries,  integrating  and  scheduling  the  application,  and 
binding  the  application  with  the  desired  Maruti  runtime  flavor. 

Mbuild  is  normally  run  in  the  directory  in  which  the  application  config  file  and  constituent 
module  source  files  are  located.  It  will  automatically  find  the  config  file  by  its  .cfg  extension,  read 
it,  and  generate  a  makefile  that  builds  what  modules  it  finds  used  there,  then  calls  the  other  Maruti 
tools.  Mbuild  works  by  creating  an  obj  subdirectory,  and  putting  all  output  files  there. 

If  there  is  more  than  one  config  file  in  the  current  directory,  the  desired  file  must  be  specified 
with  the  -f  <conf  ig  file>  option. 

The  user  may  optionally  customize  the  mbuild  actions  by  providing  an  Mbuild.  inc  file  in  the 
current  directory.  This  file  will  be  included  into  the  makefile  generated  by  mbuild.  In  addition  to 
providing  additional  build  targets  and  dependency  lines,  the  user  may  set  some  variables  to  modify 
the  mbuild  actions  themselves: 

FLAVORS  Default;  ux+xll  ui+et4k  sa+et4k  The  list  of  runtime  flavors  with  which  to  link  the 
application. 

MFC  Default:  mpc.  The  program  executed  to  compile  MPL  programs.  Not  normally  modified  by 
users. 

HPC_FLAGS  Default:  <empty>.  Supplemental  flags  for  the  MPL  compiler.  Most  GCC  flags  will  work 
here.  Most  often  the  user  will  want  to  customize  the  include  directories  with  -I  <dir>. 

CFG  Default:  cfg.  The  program  executed  to  interpret  the  MCL  config  file  and  integrate  the 
application.  Not  normally  modified  by  users. 

CFG_FLAGS  Default:  <empty>.  Supplemental  flags  for  the  MPC  integrator.  Not  normally  modified 
by  users. 

ALLOCATOR  Default:  allocator.  The  program  executed  to  allocate  and  schedule  the  application. 
Not  normally  modified  by  users. 
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ALLOCATOR-FLAGS  Default:  -p  1.  Flags  for  the  Allocator.  See  section  6.4  on  the  Allocator  below 
for  more  details. 

HBIND  Default:  mbind.  The  program  executed  for  binding  the  application  and  runtime  system. 
Not  normally  modified  by  the  users. 

MBIND-FLAGS  Default:  <empty>.  Flags  for  the  Mbind  program.  Not  normally  modified  by  users. 

6.2  MPL/C  Compiler 

The  MPL/C  compiler  (mpc)  consists  of  a  modified  gcc  plus  some  attendant  scripts  to  post-process 
the  compiler  output.  It  generates  a  .o  file  for  a  module,  plus  a  .eul  file  containing  a  partial 
elemental-unit  graph  to  be  read  by  the  integrator. 

The  mpc  program  will  accept  GCC  command-line  options.  See  the  gcc(l)  manual  page  for 
details  on  the  available  options.  The  most  commonly  used  option  will  be  -I  dir  to  customize  the 
include  directories. 


6.3  MCL  Integrator 

The  MCL  Integrator  (cfg)  reads  the  application  config  file  {appname. cfg)  and  all  the  module 
elemental-unit  graph  files  (modulename.  buI),  then  generates  and  checks  all  the  jobs,  tasks,  threads, 
and  connections  for  the  application.  It  outputs  a  loader  map  file  [appname. Idf),  and  a  complete 
application  elemntal-unit  graph  annotated  with  allocation  and  scheduling  constraints  and  commu¬ 
nication  parameters  [appname. sch).  There  are  no  cfg  options  normally  used. 


6.4  Allocator/Scheduler 

The  Allocator/Scheduler  (allocator)  attempts  to  find  a  valid  allocation  for  the  application  tasks 
across  the  nodes  of  the  network,  and  a  valid  schedule  for  each  node  and  for  the  network  bus. 
The  allocation  and  schedules  are  considered  vaLd  if  all  allocation,  communication,  and  scheduling 
constraints  are  met. 

The  allocator /scheduler  stops  when  a  valid  allocation  and  schedule  is  found,  or  when  it  is 
determined  that  one  cannot  be  found.  There  is  no  attempt  to  load-balance  the  nodes  or  minimize 
network  communications  beyond  what  is  needed  for  a  minimally  valid  schedule.  The  allocator 
outputs  an  allocation  information  file  (appname. alloc)  and  calendar  schedules  file  (appname. cal). 

The  allocator  takes  two  flags: 

-p  <nuinber  of  processors>  Default:  1.  The  number  of  processors  in  the  target  system.  It 
should  match  the  number  of  nodes  defined  in  the  config  file. 

-t  ctdma  slot  si2e>  Default:  1000.  The  Time  Division  Multiplexed  Access  (TDMA)  slot  size 
for  the  network  bus.  This  is  the  time,  in  microseconds,  that  each  node  will  be  alloted  to 
transmit  on  the  network.  All  the  nodes  get  a  TDMA  slot  in  turn.  The  tdma  slot  size 
should  stay  between  1000  and  16000  microseconds,  depending  on  the  application’s  latency 
requirements  and  the  network  hardware’s  buffering  capacity. 
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6.5  Maruti  Binder 

The  Maruti  Binder  (mbind)  reads  in  the  loader  map  .Idf  file,  the  allocation  .alloc  file,  and 
the  calendars  .cal  file,  and  generates  the  static  data  structures  needed  by  the  runtime  system 
(appname-globals.c).  It  also  generates  a  makefile  (appname-bind.mk)  that  manages  the  linking 
of  each  task  of  the  application  within  its  own  logical  address  space,  then  linking  all  tasks  together 
with  the  various  flavors  of  the  runtime  library. 


6.6  Timing  Trace  Analyzer 

The  Timing  Trace  Analyser  ('tines'ta't)  takes  a  list  of  timing  output  files  as  generated  by  the 
runtime  system  and  generates  a  .wcet  file  that  contains  the  worst  case  execution  times  for  the 
elemental  units,  as  needed  by  the  allocator.  Timestat  also  prints  other  statistics  generated  by  the 
runtime  system. 

6.7  Timing  Stats  Monitor 

Timing  information  is  output  from  a  stand-alone  Maruti  system  through  a  serial  port  when  the 
application  terminates.  The  mgettimes  program,  running  on  another  computer  connected  to  the 
other  end  of  that  serial  line,  will  receive  the  timing  data  and  store  it  in  a  file  suitable  for  processing 
by  timestat.  Mgettimes  can  process  the  output  of  multiple  runs  on  the  test  setup,  even  from 

different  applications.  Simply  leave  the  program  running  and  any  data  that  is  received  will  be 
saved. 

Mgettimes  is  called  as  follows: 

mgettimes  <speed>  <serial“port> 

where  <speed>  is  the  communications  rate  at  which  the  times  wiD  be  output  (19200  bps  in  the 
default  core),  and  <serial-port>  is  the  device  file  for  the  communications  port  (for  example, 
/dev/ttyOO  for  the  PC’s  COMl  port). 
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Abstract 

We  consider  the  replication  problem  of  series-parallel  (SP)  task  graphs  where  each  task  may 
run  on  more  than  one  processor.  The  objective  of  the  problem  is  to  minimize  the  total  cost 
of  task  execution  and  interprocessor  communication.  We  call  it,  the  mimmum  cost  npHcation 
problem  for  SP  graphs  (MCRP-SP).  In  this  paper,  we  adopt  a  new  communication  model  where 
the  purpose  of  replication  is  to  reduce  the  total  cost.  The  class  of  applications  we  consider 
is  computation-intensive  applications  in  which  the  execution  cost  of  a  task  is  greater  than 
its  communication  cost.  The  complexity  of  MCRP-SP  for  such  applications  is  proved  to  be 
NP-complete.  We  present  a  branch-and-bound  method  to  find  an  optimal  solution  as  well  as 
an  approximation  approach  for  suboptimal  solution.  The  numerical  results  show  that  such 
replication  may  lead  to  a  lower  cost  than  the  optimal  assignment  problem  (in  which  each  task 
is  assigned  to  only  one  processor)  does.  The  proposed  optimal  solution  has  the  complexity  of 
0(n22"M),  while  the  approximation  solution  has  0(n^M%  where  n  is  the  number  of  processors 
in  the  system  and  M  is  the  number  of  tasks  in  the  graph. 


1  Introduction 


Distributed  computer  systems  bave  often  resulted  in  improved  reliability,  flexibility,  tbrougb- 
put,  fault  tolerance  and  resource  staring.  In  order  to  use  the  processors  available  in  a  ^s- 
tributed  system,  tte  tasks  tave  to  be  allocated  to  tte  processors.  Xte  allocation  problem  is 
one  of  tte  basic  problems  of  distributed  computing  wtose  solution  tas  a  far  reacting  impact 
on  tte  usability  and  efficiency  of  a  distributed  system.  Clearly,  tte  tasks  of  an  application 
tave  to  be  executed  satisfying  tte  precedence  and  otter  synctronization  constraints  among 
ttem.  (Suet  constraints  axe  often  specified  in  tte  form  of  a  task  grapt.) 

In  executing  an  application,  defined  by  its  task  grapt,  we  tave  tte  option  of  restricting 
ourselves  to  taving  only  one  copy  of  eact  task.  Tte  allocation  problem,  in  ttis  case,  is 
referred  to  as  assigmnent  problem.  K,  on  the  otter  hand,  a  task  may  be  replicated  multiple 
times,  tte  general  problem  is  called  tte  replication  problem.  In  this  paper,  we  consider  the 
replication  problem  and  present  an  algorithm  to  find  tte  optimal  replication  of  series-parallel 
graphs  for  some  applications. 

distributed  processing  apptcations,  tte  objective  of  tte  allocation  problem  may  be 
the  minimum  completion  time,  processor  load  balancing,  or  total  cost  of  execution  and 
communication,  etc.  For  tte  assignment  problem  where  the  objective  is  to  minimize  tte  total 
cost  of  execution  and  interprocessor  communication,  [1][12]  present  an  0{n^M)  algorithm 
for  series-parallel  graphs  of  M  tasks  and  n  processors.  For  general  kinds  of  task  graphs, 
the  assignment  problem  tas  been  proven  in  [9]  to  be  NP-complete.  Many  papers  [8]  [9]  [10] 
present  branct-and-bound  methods  which  3'i6ld  the  optimal  result.  Other  heuristic  methods 
tave  been  considered  by  Lo  in  [7]  and  Price  and  Kristnaprasad  in  [5].  All  these  works  focus 
on  the  assignment  problem. 

Traditionally,  the  main  purpose  of  replicating  a  task  on  multiple  processors  is  to  increase 
tte  fault  tolerance  degree  [2][6].  If  some  processors  in  the  distributed  system  fail,  the  ap¬ 
plication  still  may  survive  using  otter  copies.  Under  such  a  computation  model,  a  task  tas 
to  communicate  with  multiple  copies  of  otter  tasks.  As  a  consequence,  the  total  cost  of' 
execution  and  communication  of  the  replication  problem  will  be  bigger  than  that  of  the  as¬ 
signment  problem.  In  ttis  paper,  we  adopt  another  computation  model  where  the  rept cation 
of  a  task  is  not  for  the  sake  of  fault  tolerance  but  for  decreasing  of  the  total  cost.  Under  our 
model,  eact  task  may  tave  more  than  one  copy  and  it  may  start  its  execution  if  it  receives 
necessary  data  from  any  one  copy  of  proceeding  task. 

Tte  class  of  applications  we  consider  in  ttis  paper  is  computation-intensive  applications 
where  the  execution  cost  of  a  task  is  always  greater  than  its  communication  cost.  In  this 


paper,  we  prove  that  for  the  computation-intensive  applications,  the  replication  problem  is 
still  NP-complete,  and  we  present  a  branch-and-boimd  algorithm  to  solve  it.  The  overall 
complexity  of  the  solution  is  0{n^2^M).  Note  that  the  algorithm  is  able  to  solve  the  problem 
in  the  complexity  of  the  linear  function  of  M. 

In  the  remainder  of  this  paper,  the  series-parallel  graph  model  and  the  computation 
model  are  described  in  Section  2.  In  Section  3,  the  replication  problem  is  formulated  as  the 
minimum  cost  0-1  integer  programming  problem  and  the  proof  of  NP  completeness  is  given. 
A  branch-and-bound  algorithm  and  some  numerical  results  are  given  in  Section  4.  In  Section 
5,  the  overall  algorithm  is  presented  and  conclusion  remark  is  drawn  in  Section  6. 


2  Dej5nitions 

2.1  Graph  Model 

A  series-parallel  (SP)  graph,  G  =  {V,E),  is  a  directed  graph  of  type  p,  where  p  €  {Tun*, 
Tchaini  '^andt  Tot)  ajid  G  has  a  souTce  node  (of  indegree  0)  and  a  sink  node  (of  outdegree  0). 
A  SP  graph  can  be  constructed  by  recursively  applying  the  following  rules. 


graph  G  —  (V^.^)  —  ({u},  p)  is  a  SP  graph  of  type  T^nn.  (Node  v  is  the  source  and 
the  sink  of  G.) 

2.  If  Gi  =  (Vi,£i)  and  G7  =  (I^,i^2)  are  SP  graphs  then  (?'  =  (V',  E')  is  a  SP  graph  of 
type  Tdiaint  where  V  =  Vi  \J  V2  and  E^  =  Ei  U  E2  U  {<sink  of  Gi,  source  of  G^  >}. 

3.  If  each  graph  G{  =  (Vi,Ei)  with  source-sink  pair  {si,i{),  where  s,-  is  of  outdegreee  1,  is 

a  SP  graph,  V  i  =  1,2,. . .  ,n,  and  new  nodes  s'  ^  Vi  and  t'  ^  K',  V  i  are  given  then  G' 
=  {V\  E')  is  a  SP  graph  of  type  T;„i(or  type  Tor),  where  V"'  =  K  U  Vj  U  . . .  U  K  U. 
{s',  t'}  and  E'  =  El  U  E2  V  ...  \J  En  U  {<  s',Si  >  |  V  f  =  1,2,. . .  ,n  }  U  {<  >  | 

V  I  =  1,2,. . .  ,n  }.  The  source  of  G',  s',  is  called  the  forker  of  G'.  The  sink  of  G',  t', 
is  called  the  joiner  of  G'.  G'  is  a  SP  graph  of  type  Tandioi  type  Tor)  if  there  exists  a 
parallel-and  (or  parallel-or)  relation  between  G,-’s. 


A  convenient  way  of  representing  the  structure  of  a  SP  graph  is  via  a  parsing  tree  [4]. 
There  are  four  kinds  of  internal  nodes  in  a  parsing  tree:  r^am,  T^nd  and  Tor  nodes.  A 

Tvnit  node  has  only  one  child,  while  a  Tchain  node  has  more  than  one  child.  Every  internal 


The  purpose  of  the  replication  problem  considered  in  this  paper  is  to  decrease  the  sum  of 
execution  and  communication  costs.  Under  such  consideration,  there  is  no  need  to  enforce  plural 
communication  between  any  two  task  instances.  Hence,  we  propose  the  i-ou(-o/.„  communication 
model.  In  the  model,  for  each  edge  <  i,  j  >  6  £,  a  task  instance  ty,  may  start  its  execution  it  it 
receives  the  data  from  any  one  task  instance  of  its  predecessor,  task  i. 


3  Problem  Formulation  and  Complexity 


Based  on  the  computational  model  presented  in  Section  2.2,  the  problem  of  minimizing  the  total 
sum  of  execution  and  communication  costs  for  an  SP  task  graph  can  be  approached  by  replication 
of  tasks.  An  example  where  the  repUcation  may  lead  to  a  lower  sum  of  execution  costs  and 
commumcation  costs  is  given  in  Figure  2,  where  the  .number  of  processors  in  the  system  is  two,  and 
the  ececution  costs  and  communication  costs  axe  listed  in  e  table  and  fi  table  respectively.  If  each 
task  is  allowed  to  run  on  at  most  one  processor,  then'  the  optimal  allocation  will  be  to  assign  task 
a  to  processor  1,  i  to  1,  c  to  1,  d  to  2,  e  to  2,  and  /  to  1.  The  minimum  cost  is  68.  However,  if 
each  task  is  allowed  to  be  replicated  more  than  one  copies,  (i.e.  to  replicate  task  a  to  processors  1 
and  2),  then  the  cost  is  67. 

We  introduce  integer  variable  X.- ,’s,  V  1  <  i  <  AT  and  1  <  p  <  n,  to  formulate  the  problem 
where  each  Xi,p  =  1  if  task  i  is  replicated  on  processor  p;  and  =  0,  otherwise.  We  define  a  binary 
function  i(a:).  If  z  >  0  then  d(z)  =  1  else  6(x)  =  0.  We  also  associate  an  allocated  flag  F(w)  with 
each  node  in  in  the  parsing  tree,  where  F(w)  =  1  if  the  allocation  for  tasks  in  the  subtree  5^  is 
valid;  and  =  0,  otherwise.  A  valid  allocation  for  the  tasks  in  is  an  allocation  that  follows  the 
semantics  of  Tchatn,  Tand,  aJid  Tw  subgraphs.  A  vahd  allocation  is  not  necessarily  the  allocation  in 
which  each  task  in  is  allocated  to  at  least  one  processor.  Some  tasks  in  subgraphs  may  be 
neglected  without  effecting  the  successful  execution  of  an  SP  graph. 

Given  an  SP  graph  G,  its  parsing  tree  T{G)  and  any  internal  node  w  in  T{G),  allocated  flag 
F{v))  can  be  recursively  computed: 


1.  if  lu  is  a  T^nit  node  with  a  child  t,  then 


I'M  ^  r(i)  = 

P=1 

2.  if  ti;  is  a Tcwn  Bode  with  c  children,  F(w)  =  F(chz/di)  x  F(child2)  x  ...x  F(childc). 

3.  if  w  is  a  Tand  node  with  forker  s,  joiner  i  and  c  children,  then  F{w)  =  F(s)  x  F{i)  x  F{childi) 
X  F{child2)  X  ...  X  F{childc). 

4.  if  tu  is  a  Tor  node  with  forker  5,  joiner  t  and  c  children,  then  F{w)  =  F{s)  x  F{t)  x  6{ F(childi) 
+  F{child2)  +  . . .  +  F{childc)). 

The  minimum  cost  replication  problem  for  SP  graphs,  MCfiP-SP,  can  be  formulated  as  0-1 
integer  programming  problem,  i.e: 

Z  =  Minimize  (£  Xi^  *  e.-, ^  -f  ^  min  ,-(p,  q)  *  Xj,,)  ] 

*.P  <i,j>eE,  l<?<n 

subject  to  F{t)  =  1,  where  r  is  the  root  of  r((?)  and  Xi^j,  =  0  or  l,Vi,p.  (1) 

The  restricted  problem  ■which  allows  each  task  to  run  on  at  most  one  processor  has  the  following 
formulation. 

Z  =  Minimize  £  *  e.-,p  -f  ^  j  ^  X,-.,  ] 

‘.p  <tj>e£,p,? 

n 

subject  to  Xi^p  <  1  and  F{r)  =  1, 
p=i 

where  r  is  the  root  of  T{G)  and  Xj,p  =  0  or  l,Vi,p.  (2) 

The  task  assignment  problem  (2)  for  SP  graphs  of  M  tasks  onto  n  processors,  has  been  solved 
in  0{n^M)  time  [12].  However,the  multiprocessor  task  assignment  for  general  types  of  task  graphs 
without  replication  has  been  reported  to  be  NP-complete  [9].  As  for  the  MCRP-SP  problem,  it 
can  be  shown  to  be  NP-complete.  In  this  paper,  we  are  able  to  solve  the  problem  and  present  a 
linear-time  algorithm  that  is  linear  in  the  n'umber  of  tasks  when  the  number  of  processors  is  fixed 
for  computation-intensive  applications. 


3.1  Assignment  Graph 


Bokhari  [1]  introduced  the  assignment  graph  to  solve  the  task  assignment  problem  (2).  To  prove 

the  NP  completeness  of  problem  (1)  and  solve  the  problem,  we  also  adopt  the  concept  of  the 

^signment  graph  of  an  SP  graph.  The  assignment  graph  of  an  SP  graph  can  be  defined  similarly. 

The  following  definitions  apply  to  the  assignment  graph.  And  we  draw  np  an  assignment  graph  for 
an  SP  graph  in  Figure  3. 

1.  It  IS  a  directed  graph  with  weighted  nodes  and  edges. 

2.  It  has  M  X  n  nodes.  Each  weighted  node  is  labeled  with  a  task  instance 

>  ‘iP* 

3.  A  layer  i  is  the  collection  of  n  weighted  nodes  (i.-,!,  .. .,  and  Each  layer  of  the 

graph  corresponds  to  a  node  in  the  SP  graph.  The  layer  corresponding  to  the  source  (sink) 
is  called  source  (sink)  layer. 

4.  A  part  of  the  assignment  graph  corresponds  to  an  SP  subgraph  of  type  or  is 

called  a  Tchain^  Tand  or  Tor  lifnb  respectively. 

5.  Communication  costs  are  accounted  for  by  giving  the  weight  /x,j(p,  q)  to  the  edge  going  from 

t.>toi,-,,. 

6.  Execution  costs  are  assigned  to  the  corresponding  weighted  nodes. 

Given  an  assignment  graph,  Bokhaxi  [1]  solves  Problem  (2)  by  selecting  one  weighted  node 
from  each  layer  and  including  the  weighted  edges  between  any  two  selected  nodes.  This  resulting 
subgraph  is  called  an  allocation  graph.  To  solve  Problem  (1),  more  than  one  weighted  node  from 
each  layer  may  be  chosen.  Siimlarly,  a  replication  graph  for  Problem  (1)  can  be  constructed  from 
an  assignment  graph  by  including  all  selected  nodes  and  edges  between  these  nodes.  Examples  of 
an  allocation  ^aph  and  a  repUcation  graph  are  shown  in  Figure  4  for  an  assignment  graph  shown 

in  Figure  3.  Note  that  for  each  node  i  in  the  replication  graph  there  is  only  one  edge  incident  to 
it  from  each  predecessor  layer  of  z. 

Id  a  replication  graph,  each  layer  may  have  more  than  one  selected  node.  Let  Variable  Xi 
=  (X/,1,  Xi^2,  -  Xi,„)  be  a  replication  vector  for  layer  I  in  a  replication  graph.  We  define  the 


minimum  activation  cost  of  vector  X{  for  layer  i  ,  A{{X{),  to  be  the  minimum  sum  of  the  weights 
of  all  possible  nodes  and  edges  leading  to  the  selected  nodes  of  layer  i  in  a  replication  graph. 
Then  the  goal  of  Problem  (1)  can  be  achieved  by  computing  the  minimal  value  of  {Asiak(-^smk)  + 
I3p=i  ■X’siuk.p  *  Csink,?}  over  all  possible  values  of  Xsmk- 

3.2  Complexity 

In  this  section,  we  can  show  that  Problem  (1)  for  a  computation-intensive  application  is  NP- 
complete  provided  we  prove  the  following: 

Lemma  1:  For  any  layer  /  in  the  replication  graph,  the  minimum  activation  cost  for  two  selected 
nodes  and  t;,,  will  be  always  greater  than  that  for  either  node  or  tj,,  onl}'. 

Proof:  The  Lemma  can  be  proven  by  contradiction.  Let  be  the  the  minimum  activation  cost  for 
two  nodes  t/,p  and  t^,  and  and  Az  be  the  minimum  costs  for  t/,p  and  t/,,  respectively.  Assume 
that  Ai  <  Az  and  Ai  <  A3.  Since  Ai  includes  the  activation  cost  of  node  ii^p,  an  activation  cost 
for  ti,p  only  can  be  obtained  from  Aj .  The  obtained  value  c  is  not  necessarily  the  minimum  value 
for  t2,p,  hence  A2  <  c.  The  value  c  is  obtained  by  removing  some  weighted  nodes  and  edges  from 
replication  graph.  This  implies  that  c  <  Aj.  From  above,  we  find  that  Aj  <  Ai,  which  contradicts 
the  assumption.  The  same  reasoning  can  be  applied  to  A3  and  reaches  a  contradiction.  Therefore, 
the  assumptions  are  incorrect  and  Lemma  1  holds. 


□ 

Lemma  1  can  be  further  extended  to  the  cases  where  more  than  two  weighted  nodes  axe  chosen. 
The  conclusion  we  can  draw  is  that  the  more  nodes  are  selected  from  a  layer,  the  bigger  the 
activation  cost  is. 

Lemma  2:  Given  a  computation-intensive  application  with  its  SP  task  graph  G  =  (V,  E)  and  its 
assigiunent  graph,  if  node  i  has  outdegree  one  and  edge  <  >  €  E,  then  for  any  vector  Xj,  the 

minimal  activation  cost  Aj{Xj)  can  be  obtained  by  choosing  only  one  weighted  node  from  layer  i. 

(i-e.  e;.,x>  =  i) 

Proof:  The  Lemma  can  be  proven  by  Cdtotradiction.  Since  node  i  has  outdegree  one  and  edge 


<  i,j  >  E  E,  we  know  that 


X{^. 


n 


,P  *  ^i,p 


+y. 


p=i 


5=1 


Let  us  assume  that  the  above  equation  reaches  a  minimal  value  m  when  more  than  one  node 
from  layer  i  is  selected  and  the  optimal  replication  vector  is  Xf.  Since  ^i,p  >  1  for  Xf ,  we 

may  remove  one  selected  node  from  layer  i  and  obtain  a  new  vector  X/.  Without  loss  of  generality, 

let  us  remove  By  removing  node  t,-,,.,  a  new  value  m'  is  obtained.  Since  m  is  the  minimum 
value  for  layer  i,  it  implies  that  m  <  m'. 

Erom  lemma  1,  we  obtain  that  A,(X,0  <  MXf).  And  fot  a  compntation-intensive  appUcation, 
the  following  holds  that  <  inin,(e,j,),  V  1  <  p  <  n.  Then, 


P=1  5=1 

<  A.(X,- )  +  Yl  *  ei,p  +  min  (X'j,,  ♦  j(p,  g)) 

p=i  5=1 

<  A.(J?,?)+(f  -  e..,)  +  f:  min  (X, 

P=1  5=1 

=  )  +  E  min  (Xj, ,  *  g))]  - 

<  )  +  E  *  ®i,P  +  [E  (^Jt?  *  j(p,  9))]  -  min(e,-,p) 

P=1  5=1  ‘.p”^  ^ 

-0  "  "  " 

<  ^;(X,- )  +  E  ^i,p  *  ei,p  +  [E  (^j,5  *  W  J  (P,  ?))]  -  E  (P’  9) 

p=i  5=1 -'•.p-^  5=1 


<  ^.af)+E^i*^.-.p 

;>=! 


<  M^i)  +  2  (^i.9  *  ?))  = 

P=1 

The  result,  th  <771,  contradicts  our  assumption.  It  means  that  the  assumption  is  wrong  and 
Lemma  2  holds. 


□ 

Lemma  3:  Given  a  computation-intensive  application  with  its  SP  task  graph  G,  the  objective  of 
the  minimum  cost  can  be  achieved  by  considering  only  the  replication  of  the  forkers. 

Proof:  We  proceed  to  prove  the  lemma  by  contradiction.  Let  the  minimum  cost  for  task  repL cation 
problem  be  zq  if  only  the  forkers(i.e.  outdegree  >1)  are  allowed  to  run  on  more  than  one  processor. 
Assume  the  total  cost  can  be  reduced  further  by  replicating  some  task  i  which  is  not  a  forker.  Then 
there  are  two  possible  cases  for  i: 

1.  i  has  outdegree  0. 

2.  i  has  outdegree  1. 

In  case  1,  i  is  the  sink  of  the  w’hole  graph.  Also  i  may  be  the  joiner  of  some  SP  subgraphs.  If  i  is 
allowed  to  run  on  an  extra  processor  6,  which  is  different  from  the  one  which  i  is  initially  assigned 
to  (when  zq  is  obtained),  then  the  new  cost  will  be  sq  +  +  E<d,t>€£:  Apparently,  the  new 

cost  is  greater  than  sq.  This  contradicts  our  assumption  that  the  total  cost  can  be  reduced  further 
by  replicating  task  z. 

In  case  2,  i  has  one  successor.  Let  <  i^j  >  £  E,  From  the  assumption,  we  know  that  the 
replication  of  i  can  reduce  the  total  cost.  Hence,  the  minimum  activation  cost  for  task  instances 
in  layer  Aj(Xj)^  is  obtained  when  task  i  is  replicated  onto  more  than  one  processor.  This 
contradicts  Lemma  2.  Hence,  the  assumption  is  incorrect  and  the  objective  of  the  minimum  cost 
can  be  achieved  by  considering  only  the  replication  of  the  forkers. 


□ 

Lemma  3  tells  that,  given  an  SP  graph,  if  we  can  find  out  the  optimal  replication  for  the  forkers, 
Problem  (1)  for  computation-intensive  applications  can  be  solved.  Now,  we  show  that  the  problem 


of  finding  an  optimal  repHcation  for  the  forkers  in  an  SP  graph  is  NP-complete.  First,  a  speciaJ 
form  of  the  replication  problem  is  introduced. 

Uni-Cost  Task  Replication  (UCTR)  problem  is  stated  as  follows: 

INSTANCE:  Graph  (?'  =  (F'.E'),  V'  =  V;u  V-.  where  I  K  |  =  „  arrd  |  |  =  „.  If  ,  e  V'  aad 

P  €  I?  then  edge  <  z,y  >  e  E'  (i.e.  |  £'  |  =  m  x  a).  For  each  x  €  V;,  there  is  ai,  actiratioa  cost 
m.  Associated  with  each  edge  <  x,  p  >  6  B',  there  is  a  ccmmmdcatioi.  cost  4a,  =  n  x  »t  or  0.  A 
positive  integer  JC  <nx  mis  also  given. 

QUESTION:  Is  there  a  feasible  subset  C  such  that,  we  have 


(3) 


[Theorem  1]:  Uni-Cost  Task  Replication  problem  is  NP-Complete. 

[Proofl:  The  problem  is  in  NP  because  a  subset  Vk,  if  it  exists,  can  be  checked  to  see  if  the  sum 
of  activation  costs  and  communication  costs  is  less  than  or  equal  to  K.  We  shall  now  transform 
the  VERTEX  COVER  [3]  problem  to  this  problem.  Given  any  graph  G  =  (V,E)  and  an  integer  C 
<  I  V  1,  we  shall  construct  a  new  graph  G"  =  and  W  =  V'  u  such  that  there  exists  a 

VERTEX  COVER  of  size  C  or  less  in  G  if  and  only  if  there  is  a  feasible  subset  of  V-^  in  G'.  Let 
I  V  I  =  n  and  I  E  I  =  m.  To  construct  G\  (1)  we  create  a  vertex  u;  for  each  node  in  V,  (2)  we 
number  the  edges  in  E,  and  (3)  we  create  a  vertex  bj  for  each  edge  <  u,v  >  ^  E  where  u,  v  €  V. 
We  define  K  ~  mx  C,Vy  =  {vj,  vj,  . . . ,  u„},  =  {hj,  hj,  . . . ,  and  E'  =  {<  v^Jby  >  \  Vx  £ 

K  K  €  V’  }.  Let  =  0,  if  vx  is  an  end  point  of  the  corresponding  edge  of  vertex  by-,  and  = 
nxm,  otherwise.  An  illustration,  where  n  =  7  and  m  =  9,  is  shown  in  Figure  5. 

Let  us  now  argue  that  there  exists  a  vertex  cover  of  size  C  or  less  in  G  if  and  only  if  there  is 
a  feasible  subset  of  V{  in  G'  to  satisfy  that  the  sum  of  acti^^tion  cost  and  communication  cost  is 
m  X  G  or  less.  Suppose  there  is  a  vertex  cover  of  size  G,  then  for  each  vertex  by  (=  <  >)  in  V^, 

at  least  one  of  u  and  u  belongs  to  the  vertex  cover.  By  selecting  all  the  vertices  in  the  vertex  cover 

into  the  subset  of  V’,  we  know  that  the  sum  in  Eq.  (3)  will  be  m  x  G.  Since  G  <  n,  it  implies  that 
m  X  C  <  n  X  m. 

Conversely,  for  xny  feasible  subset  Fs  C  V;  such  that  the  total  cost  is  equal  to  or  less  than 


mC,  we  can  see  that  the  second  term  of  Eq.  (3)  (i.e.  the  sum  of  communication  cost)  must  be 
zero.  Suppose,  for  some  the  minimum  communication  cost  between  Qy  and  vertices  in 

is  nonzero,  then  the  communication  cost  will  be  at  least  mxn.  Since  C  <  n,  it  implies  that  mxn 
>  m  X  C.  The  total  cost  in  Eq.  (3)  will  be  greater  than  m  x  C,  which  is  a  contradiction.  Thus  the 
minimum  communication  cost  between  any  vertex  in  V2  ^ny  vertex  in  V}^  is  zero.  It  means  that 
at  least  one  of  two  end  points  of  each  edge  in  E  belongs  to  T4-  Since,  there  is  at  most  C  vertices  in 
Vk  (the  activation  cost  for  each  vertex  is  m),  and  by  selecting  the  vertices  in  14?  we  obtain  a  vertex 
cover  of  size  C  or  less  in  (?. 


□ 

[Theorem  2]:  The  problem,  MCRP-SP  for  computation-intensive  applications,  is  NP-complete. 

[Proo^:  Prom  Lemma  3,  we  know  that  only  the  forker  in  an  SP  graph  of  type  Tand  needs  to  run  on 
more  than  one  processor.  Consider  the  following  recognition  version  of  Problem  (1)  for  SP  graphs 
of  type  Tanrf: 

Given  a  distributed  S3^stem  of  n  processors,  an  SP  graph  G®  =  {V^.E'')  of  type  Tand,  its 
assignment  graph  E  and  two  positive  integers  m  and  r.  Let  r  be  a  multiple  of  m,  V°^  =  {s,  t, 
1,2,. ..,r}  and  E^  =  {<  s,i  >  |  2  =  1,2,. ..,r}  U  {<  i.t  >  [  i  =  1,2,.  ..,r}.  Task  s  (t)  is  the  forker 
(joiner)  of  Execution  cost  Ci^p  and  communication  cost  fiij(p^q)  are  defined  in  Jf,  V  <  ij  > 
€  E°'  and  V  1  <  p^q  <  n.  Integer  variable  =  1  if  task  i  is  assigned  to  processor  p:  and  =  0, 
otherwise.  'WTien  a  positive  integer  K  <  r  is  given,  is  there  an  assignment  of  ^i,p%  snch  that 

[  E  +  E  g)  * 


where  ]E^»‘.p  =  E^».p  ^  ^  =  •s-  (4) 

».p  »',p 

We  shall  transform  the  UCTR  problem  to  this  problem.  Given  any  graph  G'  =  {V{  U  V^^E') 
considered  in  UCTR  problem,  we  construct  an  SP  graph  of  type  Tand,  G“  =  and  its 

assignment  graph  E,  such  that  G'  has  a  feasible  subset  of  V{  to  allow  the  sum  in  Eq.  (3)  is  K  or 
less  if  and  only  if  there  is  an  assignment  of  for  and  E  to  satisfy  Eq.  (4).  Let  |  Vi  |  =  n, 


\V'\  =  m,  then  the  unit  cost  f  =  n  x  m.  Assign  r  =  m  x  /  (=  n  x  m^)  and  iiT  =  n  x  m.  The 
forker  and  joiner  of  axe  s  and  t  respectively.  Then  =  {5,  .  .,r}  and  =  {<  >  |  £ 

=  1,2,. .  .,r}  U  {<  z,t  >  I  i  =  1.2,. ,  .,7-}.  We  assign  the  execution  costs  and  communication  costs  in 
H  as  foUows.  An  iUustration,  where  m  =  2  and  n  =  3,  is  shown  in  Figure  7. 

♦  V  1  <  p  <  n,  p  =  TO. 

♦  V  1  <  i  <  r,  V  1  <  p  <  n,  if  p  =  1  then  e,-.p  =  0  else  e,-,p  =  r. 

♦  V  1  <  p  <  n,  if  p  =  1  then  et,p  =  0  else  et,p  =  r. 

♦  V  1  <  t  <  r,  V  1  <  p  <  n,  let  g  =  (i  -  1)  div  (to  X  n),  where  div  is  the  integral  division.  If 

^  0  1)  =  1  else  Mj,,-(p,  1)  =  0. 

♦  Vl<:<T,  Vl<p<n,  Vg^^l,  ?)  =  0. 

♦  V  1  <  i  <  r,  V  1  <  p,g  <  n,  fiiAp,  q)  =  0. 

It  is  easy  to  verify  that  the  SP  graph  constructed  by  the  the  above  rules  is  of  type  T^nd  and 
computation-intensive.  For  each  node  in  of  G",  we  create  /  nodes  in  C?“,  where  the  comrunica- 
tion  cost  between  each  node  and  source  s  is  either  one  or  zero. 

Let  us  now  axgue  that  there  exists  a  feasible  subset  of  V{  for  UCTR  problem  if  and  only  if  there 
exists  a  valid  assignment  of  Xi^s  such  that  the  total  sum  in  Eq.  (4)  is  K  or  less.  Suppose  a  feasible 
subset  Vk  of  V{  exists  such  that  the  sum  in  Eq.  (3)  is  C  {<  K) .  Let  V^'  be  {ua,U2,. . .  Then  we 
can  obtain  a  valid  assignment  by  letting  =  1,  =  0, . . .,  =  0,  V  1  <  i  <  r,  and  X,,i  = 

1,  ^t,2  =  0, . . .,  Xt,n  =  0,  and  Xs,p  =  1,  if  Vp  €  Vk\  and  X,,p  =  0,  if  Up  ^  Vjt,  V  1  <  p  <  n.  Since 
each  node  z  in  corresponds  to  /  nodes  in  it  is  sure  that  the  communication  cost  between 
-  node  X  and  any  node  (vp)  in  is  equal  to  the  total  communication  costs  between  these  /  nodes 
a^d  any  task  instance  of  source  (t,.p)  in  By  summing  up  all  the  costs,  we  can  obtain  that  the 
total  sum  is  C.  Since  C  <  X  <  n  x  m  <  r,  this  is  a  valid  assignment. 

Conversely,  if  there  exists  an  assignment  of  Xi,p’s  such  that  the  sum  in  Eq.  (4)  is  K  or  less, 
then  the  following  must  be  true  that  X,- :  =  1,  X.-.z  =  0, . . .,  X.-.„  =  0,  V  1  <  i  <  r,  and  X7.1  =  1,’ 
^1.2  =  0, . . .,  Xt.„  =  0.  It  is  because  for  some  p  #  1,  if  Xi,p  =  1  then  the  sum  must  be  greater  than' 


r,  which  causes  a  conflict.  Hence  the  second  term  in  Eq.  (4)  must  be  zero.  Thus,  we  may  obtain  a 
subset  of  Vi  for  UCTR  problem  by  selecting  node  s  €  Vi  if  equals  1.  Since  the  first  term  in 
Eq.  (3)  is  equivalent  to  the  first  term  in  Eq.  (4),  the  total  sum  for  UCTR  problem  will  be  also  K 
or  less  then. 


□ 


4  Optimal  Replication  for  SP  Graphs  of  Type  Tand 

In  this  section,  we  develop  the  branch-and-bound  algorithm  to  find  an  optimal  solution  for  Tand 
subgraphs.  The  non-forker  nodes  only  need  to  run  on  one  processor.  Hence,  an  optimal  assignment 
of  non-forker  nodes  can  be  done  after  an  optimal  replication  for  forkers  is  obtained. 

4.1  A  Branch-and-Bound  Method  for  Optimal  Replication 

Consider  a  Tand  SP  graph  with  forker-joiner  pair  (s,h)  shown  in  Figure  6.  There  are  B  subgraphs 
connected  by  s  and  h.  These  B  subgraphs  have  a  parallel-and  relationship.  Since  the  joiner  h  has 
only  one  copy  in  optimal  solution  (i.e.  A’/i.p  =  1),  we  decompose  the  minimum  cost  replication 

problem  V  for  a  Tand  SP  graph  into  n  subproblems  q  =  1,  2,  . . .,  n,  where  is  to  find  the 
minimum  cost  when  the  joiner  is  assigned  to  processor  q  (i.e.  Xh.,g  =  1). 

Given  a  joiner  instance  subgraphs  Gb’s,  6  =  1,  2,  .. .,  H,  and  the  minimum  costs 

between  each  forker  instance  and  joiner  instance  V  1  <  p  <  n  and  1  <  6  <  H.  we  further 

decompose  problem  into  n  subproblems  A:  =  1, 2, . . . ,  n,  where  k  is  the  number  of  replicated 
copies  that  the  forker  s  has.  Basically,  means  the  problem  of  finding  an  optimal  replication  for 
k  copies  of  forker  s  where  the  joiner  h  is  assigned  to  processor  q.  Since  the  problem  of  finding  an 
optimal  replication  for  forker  s  is  NP-copaplete,  we  propose  a  branch-and-bound  algorithm  for  each 
subproblem  Vl- 

We  sort  the  forker  instances  according  to  their  execution  costs  e3,p’s  into  non-decreasing  order. 
Without  loss  of  generality,  we  assume  e,,!  <  Cj,;  S  •  •  •  <  c»,n  •  We  represent  all  the  possible 
combinations  that  s  may  be  replicated  by  a  combination  tree  "with  (”)  leaf  nodes.  To  make  the 
solution  efilcient,  we  shall  not  consider  U  combinations  since  it  is  time-consuming.  We  apply  a 


least-cost  branch-and-bound  algorithm  to  find  an  optimal  solution  by  traversing  a  small  portion  of 
the  combination  tree. 

During  the  search,  we  maintain  a  variable  z  to  record  the  minimum  value  known  so  far.  The 
search  is  done  by  the  expansion  of  intermediate  nodes.  Each  intermediate  node  v  at  level  y  repre¬ 
sents  a  combination  of  y  out  of  n  forker  instances.  The  expansion  of  node  v  generates  at  most  n-y 
child  nodes,  while  each  child  node  inherits  y  forker  instances  from  v  and  adds  one  distinct  forker 
instance  to  itself.  For  example,  if  node  u  is  represented  by  -<  ,  . . . ,  >.,  where 

<  . .  .<  iy,  then  -<  y  represents  a  possible  child  node  of  n,  V  1  <  j  <. 

n-iy.  A  combination  tree,  where  =  4  and  n  =  6,  is  shown  in  Figure  8.  At  any  intermediate  node 
of  a  combination  tree,  we  apply  an  estimation  function  to  compute  the  least  cost  this  node  can 
achieve.  If  the  estimated  cost  is  greater  than  z,  then  we  prune  the  node  and  the  further  expansion 
of  the  node  is  not  necessary.  Otherwise,  we  insert  this  node  along  with  its  estimated  cost  into  a 
queue.  The  nodes  in  the  queue  axe  sorted  into  non-decreasing  order  of  their  estimated  costs,  where 
the  first  node  of  the  queue  is  always  the  next  one  to  be  expanded.  When  the  expansion  reaches 
a  leaf  node,  the  actual  cost  of  this  leaf  is  computed.  If  the  cost  is  less  than  i,  we  update  z.  The 
algorithm  terminates  when  the  queue  is  empty. 


4.1.1  The  Estimation  Function 

The  proposed  branch-and-bound  algorithm  is  characterized  by  the  estimation  function.  Let  node  v 
be  at  level  y  of  the  combination  tree  associated  with  subproblem  Vl  and  be  represented  by  K  t*  ij , 
•  •  •»  y,  where  <  . . .  <  iy.  Any  leaf  node  that  can  be  reached  from  node  v  needs 

k-y  more  forker  instances.  Let  ^  =  -<  ,  jj,  . . . ,  jk-y  V  be  a  tuple  of  A:  -  j,  instances  chosen  from 

the  remaining  n-iy  instances,  where  h<  32  <...<  jk-y.  Let  i  be  the  set  of  all  possible  f's.  Let 
^(u)  be  the  smallest  cost  among  all  leaf  nodes  that  can  be  reached  from  node  v. 


g{v)  =  -f 

“=i 


B 


_  min 

p=»2t«2i— or  pe^ 


Since  the  complexity  involved  in  computing  s{v)  is  we  use  the  following  estimation  function 

€si[v)  to  approximate  g{v): 


y  B 

est{v)=J2es,i.  +  E  +  E  (C'i.,)  + 

P— *1  )*2 


a=l 


i=iy+l 


(5) 


Since 


ir+k-y  2  B 

E  ^  E  E  ^  E  (^p.9)  > 

J=*y+1  3x€t  6=1  P-‘»+l.»y+2,...,n  pg^ 


it  is  easy  to  see  tliat  est{v)  <  p(v).  Hence,  we  use  est(v)  as  tlie  lower  bound  of  tbe  objective 
function  at  node  v. 


4.1.2  The  Proposed  Algorithm 

Three  parameters  of  the  branch- and-bound  algorithm  are  joiner  instance  the  number  of 

processors  that  forker  s  is  allowed  to  run  {k),  and  the  up-to-date  nodnimum  cost  (z).  The  algorithm 
BB{k,q,z)  is  shown  in  Table  1. 

The  MCRP-SP  problem  can  be  solved  by  invoking  BB{k,q,z)  r?  times  with  parameters  set  to 
different  values.  BB(k^  5,  z)  solves  the  problem  while  the  whole  procedure,  shown  in  Table  2, 
solves  V. 


4.2  Performance  Evaluation 

The  essence  of  the  branch-and-bound  algorithm  is  the  expansion  of  the  intermediate  nodes.  Upon 
the  removal  of  a  node  from  the  queue  its  children  are  generated  and  ther  estimated  values  are 
computed.  If  the  estimation  function  performs  well  and  gives  a  tight  lower  bound  of  objective 
function,  the  number  of  expanded  nodes  should  be  small.  Then  an  optimal  solution  can  be  found 
out  as  soon  as  possible. 

We  conduct  two  sets  of  experiments  to  evaluate  the  performance  of  the  proposed  solution.  The 
performance  indices  we  consider  are  the  number  of  enqueued  intermediate  nodes  (EIM)  and  the 
number  of  visited  leaf  nodes  (VLF)  during  the  search.  We  calculate  EIM  and  VLF  by  inserting  one 


counter  for  each  index  at  Unes  13  and  8  of  Table  1  respectively.  Each  time  the  execution  reaches 
line  13  (8),  EIM  (VLF)  is  incremented  by  1. 

The  Srst  set  of  experiments  is  on  SP  grnphs  of  type  where  the  commnnicntion  cost  between 

any  two  task  instances  is  arbitrary  and  is  generated  by  random  nnmber  generator  within  the  range 
[1,50].  The  execntion  cost  for  each  task  instance  is  also  randomly  generated  within  the  same  range. 
The  second  set  of  experiments  is  on  SP  graphs  of  type  with  the  constrain  of  computation- 
mtensive  appUcations.  We  vary  the  sire  of  the  problem  by  assigning  different  values  to  the  number 
of  processors  in  the  system  (n)  and  the  number  of  paxallel-and  subgraphs  connected  by  forker  and 
joiner  (B).  For  each  size  of  the  problem  (n,  B),  we  randomly  generate  50  problem  instances  and 

solve  them.  The  results,  including  the  average  values  of  EIM  and  VLF  over  the  solutions  of  50 
problem  instances,  axe  summarized  in  Table  3. 

Erom  Table  3,  rve  find  out  that  the  proposed  method  significantly  reduces  the  number  of  ex¬ 
pansions  for  intermediate  nodes  and  leaf  nodes.  For  example,  for  problem  size  (n,  B)  =  (  20,  40), 
the  total  number  of  leaf  nodes  is  2^0  (=  1,048,576)  if  an  exhaustive  search  is  applied.  However’ 

our  algonthm  only  generates  16,857  nodes  on  the  average,  because  we  apply  est(v),  z,  and  the 
branch-aiid-bound  approacb. 

The  branch-and-bound  approach  and  the  estimation  function  even  perform  better  for  the 
computation-intensive  applications.  We  can  see  that  EIM  and  VLF  values  are  much  more  smaHer 
m  Set  n  than  those  in  Set  I.  It  is  because  that  in  the  computation-intensive  applications  an  optimal 
number  of  replications  for  the  forker  is  smaHer  than  that  in  general  applications.  The  f  value  in 
function  OPT{)  is  able  to  reflect  this  fact  and  avoid  the  unnecessary  expansions. 


5  Sub-Optimal  Replication  for  SP  Graphs  of  lype  Tand 


The  branch-and-bound  algorithm  in  section  4.1  yields  an  optimal  solution  for  subgraphs. 
However,  the  complexity  involved  is  in  exponential  time  in  the  worst  case.  Hence,  we  also  consider 
to  find  a  near-optimal  solution  in  polynomial  time. 


5.1  Approximation  Method 


For  the  problem  Vl  defined  in  section  4.1,  we  exploit  an  approximation  approach  to  solve  it  in 
polynomial  time.  The  approach  is  based  on  iterative  selection  in  a  dynamic  programming  fashion. 
Given  a  joiner  instance  and  subgraphs  Gt,  b  =  1,  2,  . . B,  and  minimum  costs  between 

and  p  =  1,  2,  ...,  n,  and  6  =  1,  2,  B.  we  define  Sub{p,b)  to  be  the  sub-optimal 
solution  for  replication  of  forker  s  where  forker  instances  is,i,  t,,2  ,  . . . ,  and  subgraphs  Gi,  Gj, 

. . . ,  Gi  are  taken  into  consideration. 

Strategy  1; 

Sub{p,  b)  can  be  obtained  from  Sub(p—  1,  b)  by  considering  one  more  forker  instance  t^^p.  Strategy 
1  consists  of  two  steps.  The  first  step  is  to  initialize  5nh(p,  b)  to  be  Sub(p  -  1, 6)  and  to  determine 
if  U,p  is  to  be  included  into  Subip,  b)  or  not.  If  yes,  then  add  in.  The  second  step  is  to  examine 
if  any  instances  in  Sub{p  -  1,  b)  should  be  removed  or  not.  Due  to  the  possible  inclusion  of  in 
the  first  step,  we  may  obtain  a  lower  cost  if  we  remove  some  instances  U/'s,  i  <  p,  and  reassign  the 
communications  for  some  graphs  Gy’s  from  is/s  to  t^^p. 

Strategy  2: 

Sub(p,b)  can  also  be  obtained  from  Sub{p,b-  1)  by  taking  one  more  subgraph  Gfc  into  account. 
Initially,  Sub(p,  b)  is  set  to  be  5 ubip,  6  -  1).  The  first  step  is  to  choose  the  best  forker  instance  from 
,  is,2 ,  ■  -  - ,  is,p  for  Gi.  Let  the  best  instance  be  .  The  second  step  is  to  see  if  is  in  Sub{p,  b)  I 

or  not.  K  not,  a  condition  is  checked  to  decide  whether  should  be  added  in  or  not.  Upon  the 
addition  of  we  may  remove  some  instances  and  reassign  the  communications  to  achieve  a  lower  I 
cost.  I 

We  compare  two  possible  results  obtained  from  the  above  two  strategies  and  assign  the  one  with 
lower  cost  to  actual  Sub(p,  5).  Hence  by  computing  in  a  dynamic  programming  fashion,  Sub{n,  B) 
can  be  obtained.  The  algorithm  and  its  graphical  interpretation  are  shown  in  Figure  9. 

5.2  Performance  Evaluation 

The  complexity  involved  in  each  strategy  described  in  section  5.1  is  0(nB).  Since  the  solving 
of  Sub{n,B)  needs  to  invoke  nx  B  times  of  strategies  1  and  2,  the  total  complexity  of  solving 


Sub{n,B)  by  the  approximation  method  is 

We  conduct  a  set  of  experiments  to  evnJuate  the  performance  of  the  approximation  method.  For 
each  problem  size  (n,  B),  we  randomly  generate  50  instances  and  solve  them  by  using  approximation 
method  and  exhaustive  searching.  The  data  for  computation  and  communication  in  the  experiments 
axe  based  on  the  uniform  distribution  over  the  range  [1,50].  We  compare  the  minimum  cost  obtained 
from  exhaustive  searching  (EXHAUST)  with  those  from  from  approximation  (APPROX)  and  single 
assignment  solution  (SINGLE).  The  optimal  single  assignment  solution  is  the  one  in  which  only  one 
forker  instance  is  allowed.  Note  that  the  solutions  from  SINGLE  are  obtained  from  the  shortest 
path  algorithm  [l].  The  results  are  summarized  in  Table  4.  From  the  table,  we  find  out  that  the 
approximation  method  yields  a  tight  approximation  of  the  minimum  cost.  On  the  contrary,  the 
error  range  for  single  copy  solution  is  at  least  20%.  This  again  justifies  that  the  replication  can 
lead  to  a  lower  cost  than  an  optimal  assignment  does. 

6  Solution  of  MCRP-SP  for  computation-intensive  applications 

6.1  The  Solution 

Given  a  computation-intensive  application  with  its  SP  graph,  we  generate  its  parsing  tree  and 
assignment  graph  first.  The  algorithm  finds  the  minimum  weight  replication  graph  from  the  as¬ 
signment  graph.  Then  the  optimal  solution  is  obtained  from  the  minimum  weight  repHcation  graph. 

The  algorithm  traverses  the  parsing  tree  in  the  postfix  order.  Namely,  during  the  traversal,  an 
optimal  solution  of  the  subtree  S^,  induced  by  an  intermediate  node  x  along  with  all  x’s  descendant 
nodes,  can  be  found  only  after  the  optimal  solutions  of  x’s  descendant  nodes  axe  found.  Given  an 
SP  graph  G  and  a  distributed  system  5,  we  know  that  there  is  a  one-tCKone  correspondence  between 
eadi  subtree  5^  in  a  parsing  tree  T{G)  and  a  limb  in  the  assignment  graph  of  G  on  F.  Whenever  a 
child  node  5  of  x  is  visited,  the  corresponding  limb  in  the  assignment  graph  will  be  replaced  with  a 
a  two-layer  Tc^ain  limb  if  6  is  a  or  T^^-type  node;  and  a  one-layer  Hmb  if  6  is  a  Tand-typ^ 

node.  The  algorithm  is  shown  in  Table  5.  A  graphical  demonstration  of  how  the  algorithm  solves 
the  problem  is  shown  in  Figure  10. 

Before  the  replacement  of  a  Tch<.in  Hmb  is  performed  (i.e.  x  is  a  rc/,<„„-type  node),  each  con¬ 
stituent  child  Hmb  has  been  replaced  with  a  or  two-layer  T^^ain  Hmb.  Hence,  the  shortest 


path  algorithm  [1]  can  be  used  to  compute  the  weights  of  the  new  edges  between  each  node  in  the  ' 
source  layer  and  each  node  in  the  sink  layer  of  the  new  T^^ain  Hmb.  The  complexity,  from  lines  05 
to  08  of  Table  5,  in  transformation  of  the  limb,  corresponding  to  an  intermediate  node  rc  with  M 
children,  into  a  two-layer  limb  is  0(Mn3).  An  example  of  illustrating  the  replacement  of  a 
Tchain  limb  is  shown  from  pants  (b)  to  (c)  and  parts  (d)  to  (e)  in  Figure  10. 

For  the  replacement  of  a  Hmb,  we  have  to  compute  C^/s.  The  values  can  also  be  computed 
by  the  shortest  path  algorithm.  Hence,  the  complexity  involved  in  Hnes  16  and  17  is 
According  to  the  computational  model  in  section  2.2,.each  task  instance  s  may  start  its  execution 
If  It  receives  the  necessary  data  from  any  task  instance  of  its  predecessor  d.  And,  from  Lemma 
2,  we  know  that  the  minimum  sum  of  initialization  costs  of  multiple  task  instances  of  s  wiH  be 
always  from  only  one  task  instance  of  d.  Therefore,  the  initialization  of  task  instance  t,.,  depends 
on  which  task  instance  of  d  it  communicates  with.  That  is  why  pn  Hne  19,  the  communication 
cost  is  added  to  the  the  execution  cost  of  e^,p  before  OPTQ  is  invoked.  And  the  most 

significant  part  of  the  replacement  is  to  compute  the  weights  on  the  new  edges  from  the  source 
layer  to  sink  layer.  The  complexity  is  x  0(0 PTO),  which  in  the  worst  case  is  -n22-.  However,  in 
the  average,  our  OPT  function  performs  prettj-  weU  and  reduces  the  complexity  significantly.  An 
example  of  illustrating  the  replacement  of  a  limb  is  shown  from  parts  (c)  to  (d)  in  Figure  10. 

We  also  consider  to  use  the  approximation  method  to  find  the  sub-optimal  replacement  of  a  | 
Tand  Hmb.  In  that  case,  function  OPT()  in  Hne  21  is  replaced  with  Sub{n,  B).  The  total  complexity  | 
involved  is  0(n^B^)  then.  *  I 

Finally,  for  the  replacement  of  a  Tor  Hmb,  if  there  are  B  subgraphs  connected  between  the  forker 
and  the  joiner,  then  the  complexity  will  be  0{Bn^)  for  the  new  edges  and  0{Bn^)  for  Cj,’s.  An 
example  of  iHustratmg  the  replacement  of  a  Tor  Hmb  is  shown  from  parts  (a)  to  (b)  in  Figure  10. 

When  the  traversal  reaches  the  root  node  of  the  parsing  tree,  the  result  of  FINDQ  will  give 
us  either  one  single  layer  or  two  layers,  depending  on  the  type  of  root  node.  All  we  have  to  do  is 
to  select  the  Hghtest  of  these  n  (in  single  layer)  or  (in  two  layers)  shortest  path  combinations. 

An  optimal  repHcation  graph  itself  is  found  by  combining  the  shortest  paths  between  the  selected 


nodes  tiat  were  saved  earlier.  The  whole  algorithm  has  the  complejdty  of 

0(An=2")  +  j:(i!,n=)  + 

*  i 

where  A  is  the  number  of  Tand  limbs,  R;  is  the  number  of  subgraphs  in  the  ith  limb,  and  Q  is 

the  number  of  layers  in  the  ith  T^hain  limb.  This  is  not  greater  than  0iMn^2%  where  M  is  the 

total  number  of  tasks  m  the  SP  graph.  The  complexity  of  the  algorithm  is  a  linear  function  of  M 
if  the  number  of  processors,  n,  is  fixed. 

6.2  Conclusion  Remark 

This  paper  has  focused  on  MCRP-SP,  the  optimal  replication  problem  of  SP  task  graphs  for 
computation-intensive  applications.  The  purpose  of  replication  is  to  reduce  inter-processor  commu- 
mcation,  and  to  fully  utilize  the  processor  power  in  the  distributed  systems.  The  SP  graph  model, 
which  is  extensively  used  in  modeling  applications  in  distributed  systems,  is  used.  The  applications 
considered  in  this  paper  are  computation-intensive  in  which  the  execution  cost  of  a  task  is  greater 
than  its  communication  cost.  We  prove  that  MCRP-SP  is  NP-complete.  We  present  branch-and- 
bound  and  approximation  methods  for  SP  graphs  of  type  The  numerical ‘results  show  that 
the  algorithm  performs  very  weU  and  avoids  a  lot  of  unnecessary  searching.  Finally,  we  present  an 
algorithm  to  solve  the  MCRP-SP  problem  for  computation-intensive  applications.  The  proposed 
optimal  solution  has  the  complexity  of  0{n’^2-M)  in  the  worst  case,  while  the  approximation  solu¬ 
tion  is  in  the  complexity  of  0{n^M%  where  n  is  the  number  of  processors  in  the  system  and  M  is 
the  number  of  tasks  in  the  graph. 

For  the  applications  in  which  the  communication  cost  between  two  tasks  is  greater  than  the 
execution  cost  of  a  task,  the  replication  can  still  be  used  to  reduce  the  total  cost.  However,  in  the 
extreme  case  where  the  execution  cost  of  each  task  is  zero,  the  optimal  allocation  will  be  to  assign 
each  task  to  one  processor.  We  are  studying  the  optimal  repHcation  for  the  general  case. 
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Figure  1:  An  SP  graph  and  its  parsing  tree 
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Figure  2:  An  example  to  show  how  the  replication  can  reduce  the  total  cost 
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Figure  7:  An  illustration  about  how  to  transform  a  UCTR  instance  to  a  Tand  SP  graph 


Table  1:  Function  BB(k,q,z):  branch-and-bound  algorithm  for  solving  problem 

01  Initialize  the  queue  to  be  empty; 

02  Insert  root  node  vq  into  the  queue; 

03  While  the  queue  is  not  empty  do  begin 
04  Remove  the  first  node  u  from  the  queue; 

05  Generate  all  child  nodes  of  u  ; 

06  For  each  generated  child  node  v  do  begin 

07  If  u  is  a  leaf  node  (i.e.  v  is  at  level  k)  then 

08  Compute  g{v)  by  setting  Ltohe  4>  ; 

09  Set  z  =  min  (  i,  p(v)); 

10  else  begin  /*  u  is  an  intermediate  node  */ 

11  Compute  est{v)  by  (5)  ; 

12  If  est{v)  <  z  then 

13  Insert  v  into  the  queue  according  to  esi(v)  ; 

14  end; 

15  end; 

16  end; 

17  Return(f). 


T^We  2:  Function  OFTCC*,'  ^s,p  the  optimal  solution  of  MCRP-SP  of  ty’pe  Tand  when 
Cp  g’s  and  e^^p’s  are  given 

01  Sort  ts,p's  into  a  non-decreasing  order  by  values  of  es^s  ; 

02  For  5  =  1  to  n  do  begin 

03  Let  node  u  be  a  leaf  node  at  level  1; 

04  Set  V  to  be  and  k  to  be  1; 

05  Compute  ff(v)  by  setting  i  to  be  <p  ; 

06  Initialize  z  to  be  p(t>)  ; 

07  For  /:  =  1  to  n  do 
08  z  =  BB{k,q,z)  ; 

09  Set  c(q)  =  i  ; 

10  end; 

11  Output  the  combination  with  the  minimum  value  among  c(l).  c(2),  . . c(n). 


Figure  9:  Pseudo  code,  graphical  demonstration,  and  dynamic  programming 
table  for  approximation  methods  ® 


Sub{p—l,b)  — »  Sub'(p,b): 

^s.P  ^  ELi([niinire5u6(p-i,6)(C'i,,)]  - 
begin 

Sub'{p,b)  =  Sub(p~  1,6)  ® 
Reassign&Remove(5u6'(p,  b)) 
end 

Else  Sub'{p,  b)  =  Sub{p  —  1,  b) 

Legend; 

(i)'*’  =  i,if  I  >  0. 

(a:)+  =  0,  if  I  <  0. 


Sub{p,b-1)  ->  Sub''(p,b): 

Let  be  the  one  satisfys  mini<,<„('C'^  1 
Iff.,.€5«6(p,6-l)then  ‘  ' 

Sub"(p,b)  =  Sub(p,b—  1) 

Else 

begin 

Sub"(p,b)  =  Sub{p-  1,6)  © 
Reassign&Remove(5u6"(p,  6)) 

end 

Else  Sub"{p,b)  =  Sub{p,b-  1) 


Sub{p—  1,6) 


subgraph  G 


ubgraph  Gh 


Sub(p,b-  1) 


Svh{p-  1,6) 


Sub'(p,  6) 


Sub(p,b-  1) 


5u6"(p,6 


Sub(p,b) 


5u6(p,6)  =  Min.Cost{Sub'{p,b),  Sub"ip,b)) 


Table  3:  Computation  Results  for  branch-and-bound  approach 


Set  I  S< 

EIM^  VLfJ  EImJ 


n 

VLF^ 


Total  Number  of 
leaves  (2”) 


6 


1,065  4,161 


2,88 


2,026  I  12,042 


3,579  18,866 


0,551  27,018 


51 


62 


68 


78 


84 


86 


340 


398 


543 


617 


720 


780 


1.175 


256 


4,096 


4,096 


4,096 


4,096 


4,096 


4.096 


1,711 

65,536 

r  2,496 

65,536 

3,127 

65,536 

"3,493 

65,536 

4,510 

65,536 

3,079 

1,048,576 

5,280 

1.048,576 

- - - - - 

1,048,576 


1,048,576 


1,048,576 


1,048.576 


3,086  16,857 


Each  value  shown  is  the  average  value  over  50  runs. 


Table  4:  Simulation  Results  for  Approximation  Method 


single  error%  = 


SINGLE  -  EXHAUST 
EXHAUST 


X  100%. 


approx  error%  = 


APPROX  -  EXHAUST 
EXHAUST 


X  100%. 


Table  5:  Algorithm  FJND(S^):  the  dgorithm  for  finding  the  shortest  path  combinations  from  the 

hnib  which  corresponds  to  the  subtree  Sx  induced  by  an  intermediate  node  x  and  all  I’s  descendant 
nodes  in  a  parsing  tree 
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Case  of  the  type  of  intermediate  node  x: 

Type  Tchain  ■■ 

For  b  =  the  first  child  node  of  x  to  the  last  one  do 

FIN D(^Si,y,  /*  Now  the  limb  corresponding  to  St  is  replaced  */ 

Replace  the  limb  corresponding  to  Sx  with  a  two-layer  Tchain  limb  where 
the  source  (sink)  layer  of  the  old  limb  is  the  source  (sink)  layer  of  new  2-layer  limb; 
Put  weights  on  the  edges  between  source  and  sink  layers  equal  to  the  shortest  path 
between  the  corresponding  nodes; 

Type  Tend  :  /*  Let  x  =  [  Tend,  forker  s,  joiner  h  ]  */ 

Let  d  be  the  predecessor  of  forker  s  in  C  (i.e.  <  d,s  >  €  F); 

Let  B  be  the  number  of  child  nodes  of  x  in  the  parsing  tree; 

/*  I.e.  there  are  B  subgraphs  connected  by  s  and  h  */ 

For  h  =  the  first  child  node  of  x  to  the  B-th  child  of  x  do 

FIND(S),);  /*  Now  the  limb  corresponding  to  Sb  is  replaced  */ 

For  p  =  1  to  71,  g  =  1  to  n  and  i>  =  1  to  F  do 

Compute  the  minimum  replication  cost  jErom  to  th,g  w.r.t.  child  b  ; 
For  2  =  1  to  n  do  begin 

For  p  =  1  to  n  do  Es,p  =  fidAhP)  +  e,,p  ; 

/*  accounts  for  initialization  by  and  execution  cost  itself.  */ 

For  g  =  1  to  n  do  ^ld,h{i,q)  =  OPT{C^Js,E,/s)  ; 

/*  Create  new  edges  from  td/s  to  th,g's  */ 

end; 

Replace  the  Tend  limb  with  a  Tnnit  limb,  where  source  layer  =  sink  layer  =  layer  h, 
and  there  axe  new  edges  from  layer  d  to  layer  k; 

Type  T„  :  /*  Let  x  =  [  Tor,  forker  s,  joiner  h  ]  */ 

Use  the  same  method  described  above  from  lines  12  to  17  to  compute 


p.? 


Replace  the  T„  limb  with  a  two-layer  Tchain  limb,  where 
the  source  (sink)  layer  of  T„  limb  is  the  source  (sink)  layer  of  Tchain  Hmb  and 
l)  =  V  p  and  g  ; 

end  case; 

Save  the  shortest  paths  between  any  node  in  source  layer  and  any  node 
in  sink  layer  for  future  reference. 
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Abstract 

TVaditional  control  systems  have  been  designed  to  exercise  control  at  regularly  spaced  time 
instants.  When  a  discrete  version  of  the  system  dynamics  is  used,  a  constant  sampling  interval  is 
assumed  and  a  new  control  value  is  calculated  and  exercised  at  each  time  instant.  In  this  paper 
we  formulate  a  new  control  scheme,  temporal  control^  in  which  we  not  only  calculate  the  control 
value  but  also  decide  the  time  instants  when  the  new  values  are  to  be  used.  Taking  a  discrete, 
linear,  time-invariant  system,  and  a  cost  function  which  reflects  a  cost  for  computation  of  the 
control  values,  as  an  example,  we  show  the  feasibility  of  using  this  scheme.  We  formulate  the 
temporal  control  scheme  as  a  feedback  scheme  and,  through  a  numerical  exaumple,  demonstrate 
the  significant  reduction  in  cost  through  the  use  of  temporal  control. 
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1  Introduction 


Control  systems  have  been  used  for  the  control  of  dynamic  systems  by  generating  and  exercising 
control  signals.  Traditional  approach  for  feedback  controls  has  been  to  define  the  control  signals, 
u{t),  as  a  function  of  the  current  state  of  the  system,  z(t).  As  the  state  of  the  system  changes 
continuously  the  controls  change  continuously,  i.e.  they  are  defined  as  functions  of  time,  t,  such 
that  time  is  treated  as  a  continuous  variable.  When  computers  are  used  for  implementing  the 
control  systems,  due  to  the  discrete  nature  of  computations,  time  is  treated  as  a  discrete  variable 
obtained  by  regularly  spaced  sampling  of  the  time  axis  at  A  seconds.  Many  standard  control 
formulations  are  defined  for  the  discrete  version  of  the  system,  with  system  dynamics  expressed  at 
discrete  time  instants.  In  these  formulations  the  system  dynamics  and  the  control  are  expressed  as 
sequences,  x{k)  and  u(k). 

Most  of  the  traditional  control  systems  were  designed  for  dedicated  controllers  which  had  only 
one  function,  to  accept  the  state  values,  x(k)  and  generate  the  control,  u{k).  However,  when  a 
general  purpose  computer  is  used  as  a  controller,  it  has  the  capabilities,  and  may,  therefore,  be 
used  for  other  functions.  Thus,  it  may  be  desirable  to  take  into  account  the  cost  of  computations 
and  consider  control  laws  which  do  not  compute  the  new  value  of  the  control  at  every  instant. 
W^'hen  no  control  is  to  be  exercised,  the  computer  may  be  used  for  other  functions.  In  this  paper 
we  formulate  such  a  control  law  and  show  how  it  can  be  used  for  control  of  systems,  achieving  the 
same  degree  of  control  as  traditional  control  systems  while  reducing  computation  costs  by  changing 
the  control  at  a  few,  specific  time  instants.  We  term  this  temporal  control. 

To  the  best  of  our  knowledge  this  approach  to  the  design  and  implementation  of  controls  has  not 
been  studied  in  the  past.  However,  taking  computation  time  delay  into  consideration  for  real-time 
computer  control  has  been  studied  in  several  research  papers  [1,  5,  6,  9,  11,  13].  But,  all  of  these 
papers  concentrated  on  examining  computation  time  delay  effects  and  compensating  them  while 
maintaining  the  assumption  of  exercising  controls  at  regularly  spaced  time  instants. 

The  basic  idea  of  temporal  control  is  to  determine  not  only  the  values  for  u  but  also  the  time 
instants  at  which  the  values  are  to  be  calculated  and  chaunged.  The  control  values  are  assumed 
to  remain  constant  between  changes.  By  exercising  control  over  the  time  instants  of  changes  the 
designer  has  an  additional  degree  of  freedom  for  optimization.  In  this  paper  we  present  the  idea  and 
demonstrate  its  feasibility  through  an  example  using  a  discrete,  linear,  and  time  invariant  system. 
Clearly,  the  same  idea  can  be  extended  to  continuous  time  as  well  as  non-linear  system. 

The  paper  is  organized  as  follows.  In  Section  2,  we  formulate  the  temporal  control  problem  and 
introduce  computation  cost  into  performance  index  function.  The  solution  approach  for  temporsd 
control  scheme  is  discussed  in  Section  3.  In  Section  4,  implementation  issues  are  addressed.  We 


provide  an  example  of  controlling  rigid  body  satellite  in  Section  5  .  In  this  example,  aji  optimal 
temporal  controller  is  designed.  Results  show  that  the  temporal  control  approach  performs  better 
than  the  traditional  sampled  data  control  approach  with  the  same  number  of  control  exercises. 
Section  6  deals  with  the  application  of  temporal  controls  to  the  design  of  real-time  control  systems. 
Finally,  Section  7,  we  present  our  conclusions. 

2  Problem  Formulation 

In  temporal  control,  the  number  of  control  changes  and  their  exercising  time  instants  within  the 
controlling  interval  [0,  Tj]  is  decided  to  minimize  a  cost  function.  To  formulate  the  temporal  control 
problem  for  a  discrete,  linear  time-invariant  system,  we  first  discretize  the  time  interval  [0,Tj]  into 
M  subintervals  of  length  A  =  Tj/M.  Let  Dm  =  {0,  A,2A,. . .,  (Jkf  -  1)A}  which  denote  M  time 
instants  which  are  regularly  spaced.  Here,  control  exercising  time  instants  are  restricted  within 
Dm  for  the  purpose  of  simplicity.  The  linear  time-invariant  controlled  process  is  described  by  the 
difference  equation; 

x(/:+l)  =  Ax{k)  +  Bii{k)  (1) 

y{k)  =  Cx(k) 

where  k  is  the  time  index.  One  unit  of  time  represents  the  subinterval  A,  whereas  x  £  7^”  and 
u  €  are  the  state  and  input  vectors  respectively. 

It  is  well  known  that  there  exists  an  optimal  control  law  [4] 

u°(i)  =  /[s(0]  i  =  0, 1, M-1  (2) 

that  minimizes  the  quadratic  performance  index  function  (Cost) 

M-i 

'^[^^{k)Qx{k)  +  u'^{k)Ru{k)]-^x'^(M)Qx{M)  (3) 

*=o 

where  Q  £  is  positive  semi-definite  and  R  £  is  positive  definite. 

As  we  can  see,  traditional  controller  exercises  control  at  every  time  instant  in  Dm-  However, 
in  temporal  control,  we  are  no  longer  constrained  to  exercise  control  at  every  time  instant  in  Dm- 
Therefore,  we  want  to  find  an  optimal  control  law,  S  and  p  for  i  =  0, 1,  ...,M  —  1: 


u°{i)  =  u°{i  -  1)  if  6{i)  =  0 
u^i)  =  p[z(0]  if  S{i)  =  1 


(4) 


above  procedure  is  described  in  Section  3.6.  Finally,  in  Section  3.7  we  explain  how  to  get  optimal 
temporal  controllers  over  an  initial  state  space. 

3.1  Closed-loop  Temporal  Control  with  Given 

Assume  that  v  and  are  given.  Then  a  new  control  input  calculated  at  U  will  be  applied  to  the 
actuator  for  the  next  time  interval  from  t,-  to  Our  objective  here  is  to  determine  the  optimal 
control  law 

=  5[a:(n,)]  i  =  0, 1, ..., u-1  (6) 

that  minimizes  the  quadratic  performance  index  function  (Cost)  Jm  which  is  defined  in  (  5). 

State  Cost 


Control  Input  Cost 


Figure  1:  Decomposition  of  Jm  into  J’,-. 

The  principle  of  optimality,  developed  by  Richard  Bellman[2,  3]  is  the  approach  used  here.  That 
is,  if  a  closed  loop  control  u®(n;)  =  5[i(n,)]  is  optimal  over  the  interval  to  <t  <  then  it  is  also 
optimal  over  any  sub-interval  tra<t<t^,  where  0  <  m  <  v.  As  it  can  be  seen  from  Figure  1,  the 


that  minimize  a  new  performance  index  function 


,  ~  w-i 

+  u^{k)Ru{k)]  +  x'^{M)Qx{M)  +  ^  6{k)ij.  (5) 

k=o  jt=o 

=  Jm  +  Cm 

Here,  /i  is  the  computation  cost  of  getting  a  new  control  value  at  a  time  instant,  and  Cm  — 
denotes  the  total  computation  cost.  Note  that  u  =  is  the  number  of 

control  changes.  Also,  let  Dt,  =  {io,ii,i2,  ■  ■  consist  of  control  changing  time  instants  where 

^0  ~  0>  tj  =  tijA,  . . .,  ti,—i  =  n^_jA.  That  is,  no,Tii,7i2,. .  are  the  indices  for  control 

changing  time  instants  and  d(n,)  =  1  for  i  =  0, 1, 2, ...  i/  -  1. 

With  this  new  setting  we  need  to  choose  u,  D^,,  and  control  input  values  to  find  an  optimal 
controller  which  minimizes  Jj^.  This  new  cost  function  is  different  from  Jm  in  two  aspects.  First, 
the  concept  of  computational  cost  is  introduced  in  as  Cm  term  to  regulate  the  number  of  control 
changes  chosen.  If  we  do  not  take  this  computation  cost  into  consideration  v  is  likely  to  become 
M .  If  computation  cost  is  high  (i.e.,  has  a  large  value)  then  v  is  likely  to  be  small  in  order  to 
minimize  the  total  cost  function.  Second,  in  temporal  control,  not  only  do  we  seek  optimal  control 
law  u(a:(t)),  but  also  the  control  exercising  time  instants  and  the  number  of  control  changes.  In  the 
next  section,  we  present  in  detail  specific  techniques  for  finding  an  optimal  temporal  control  law. 

3  Temporal  Control 

We  develop  a  three-step  procedure  for  finding  an  optimal  temporal  controller. 

Step  1.  Find  an  optimal  control  law  given  v  and 
Step  2.  Find  best  given  v 
Step  3.  Find  best  u 

First,  in  the  following  two  subsections(3.1  and  3.2)  we  derive  a  temporal  control  law  which 
minimizes  the  cost  function  Jj^  when  D,,  is  given,  i.e.,  both  time  instants  and  number  of  controls 
are  fixed.  Since  v  and  D„  are  fixed  we  can  use  Jm  defined  in  (  5)  as  a  cost  function  instead  of 
Jm‘  Secondly,  assume  that  v  is  fixed  but  can  vary.  Then  we  present  an  algorithm  in  section 
3.3  to  find  a  such  that  Jm  (and  Jm')  Is  minimized.  Finally,  we  will  vary  v  from  1  to  I'max 
to  search  an  optimal  at  which  temporal  control  should  be  exercised.  Section  3.4  presents  this 
iteration  procedure.  Section  3.5  explains  how  to  incorporate  terminal  state  constraints  into  the 
above  procedure  of  getting  an  optimal  temporal  control  law.  And  a  complete  algorithm  of  the 


total  cost  Jm  can  be  decomposed  into  Fis  for  0  <  t  <  1/  where 


Fi  =  x^{ni)Qx{ni)  +  1^(71,-  +  l)<?z(n,-  +  1)  (7) 

+  x^(n;  +  2)Qx{ni  +  2)  +  ...  +  -  l)(5x(n,+i  -  1) 

+  ("•t+i  -  ni)vF {ni)Rv,{ni) 

That  is,  from  (  1), 

Fi  =  x^  {ni)Qx{TLi)  +  (ylz(n.-)  +  Bu{ni))^  Q{Ax{ni)  +  Bu(ni))  (8) 

+  (A^x(ni)  +  ABu(ni)  +  Bu(ni)fQ(A^x(ni)  +  ABu(ni)  +  £u(ni)) 

+  ...  +  +  A^’+^-^'-^Bu(ni)  +  ...  +  ABu(ni)  +  Bu(ni)fQ 

(A^>+^-^‘-^x(ni)  +  +  ...  +  ABu(ni)  +  Bu(ni)) 

+  (n,+i  -  ni)u^(ni)Eu(ni) 

This  can  be  rewritten  as 

n,-^!  — n,-  — 1 

~  2:^(n,-)(5z(n,)  +  ^  +  ■Bju(ni)]^Q[Ajx(ni)  +  Bju(ni)]  (9) 

i=i 

+  (n,-+i  -  ni)u^(ni)Ru(n{) 

where  Aj  =  A^  and  Bj  =  J^iZo 
Then  Jm  can  be  expressed  as 

Jm  =  -fb  t  Pi  +  J2  +  •••  4-  Fi,.  (10) 

Let  *5*772  b6  tii6  cost  from  i  —  2/  *—  777,  -{-  1  to  i  ^  i/i 

Sm  =  -F +  -Fi/-Tn+2  "f"  •••  +  1  ^  <  1/  +  1. 

These  cost  terms  are  well  illustrated  in  the  above  Figure  1. 

Therefore,  by  applying  the  principle  of  optimality,  we  can  first  minimize  5i  =  p,,  then  choose 
Pi/— 1  to  minimize  —  P„_i  +  p„  =  S°  +  P,/_i  where  S°  is  the  optima]  cost  occurred  at  We 
can  continue  choosing  P,,_2  to  minimize  S3  =  P„_2  +  Pj/-i  -r  F„  =  P,/-2  +  -S’!  and  so  on  until 
5„+i  =  Jm  is  minimized.  Note  that  5i  =  p,  =  x^{ni,)Qx{n^)  is  determined  only  from  z(ni,)  which 
is  independent  of  any  other  control  inputs. 


3.2  Inductive  Construction  of  an  Optimal  Control  Law  with  Given 

We  inductively  derive  an  optimal  controller  which  changes  its  control  at  v  time  instants 
.. tu-\.  As  we  showed  in  the  previous  section,  the  inductive  procedure  goes  backwards  in  time 
from  to  Since  Si  =  -  x^{n^)Qx{nu)  +  u^{n^)Ru{n^)  and  i(n^)  is  independent  of 

^(71^),  we  can  let  =  u°{M)  =  0  and  i’f  =  {n^,)Qx{n^)  where  Q  is  symmetric  and  positive 

semi-definite. 

Induction  Basis;  (n^)Qx{n^)  where  Q  is  symmetric. 


Inductive  Assumption:  Suppose  that 

Sm  =  X^{nu-m^l  )P{v  -  7n  +  ) 

holds  for  some  m  where  1  <  m  <  u  and  P{i>  -  ttt  -t- 1)  is  symmetric. 


We  can  write  S°  as 


~  [-^(niz-m+i  — vi)  "h  —  77Z  -j-  1) 

[A(n.  )U{n^.rn)] 

From  the  definition  of  S-m  and  (  9), 
c  —  c®  -L  r* 

=  + x^{n^-m)Qx{n^-m) 

fn43  m””! 

+  H  [AjX{n^^rn)  -r  BjU{n^^rn)f  Q[AjZ{n^^rn)  +  Bju{n^^rn)] 

-r  (^i/— m+l  (^i/— m  m) 

And  the  above  equation  becomes 

•S'm+l  m43  Tn)  m  )]  -P(i^  ~  777.  +  l) 

m+l m  T7l)  "T  —71^,-771  )] 

+  X  (72.^^771  )QX(  71  ) 

^^1/— m-f3  T7i“l 

1=1 

*f”  (Ht.— TTl  +  l  “  H|/-.Tn  (^t/— 771  m  ) 


(12) 


(13) 


(14) 


If  we  differentiate  5^+1  with  respect  to  u(n^.„),  then 


dSm+J 


= 


P(iy-m  +  l)An,. 


m4  1  —Tli/— tti 


+ 

+ 

+ 

+ 


+ 


7^ 

~  ^  +  ^)^n^,^rn+i  -nu-~m 

m+] 

[2^f  QA,-i(n,_^ )  +  2Bj  QBXnu-m)] 
i=i 

2 771 4.1  “  Tii^^jyi^Ru(^n 

u—m  ) 

-nu-m  P(l/-m+l)An_„^j_n._„ 

m-fj  ““Tli/— m*“l 

+  E  BfQAj}x(n^_rn) 

i=i 

^i/— m+1  — Tip— m"”! 

Xy  Pj  Q^j  "h  (’^i/-m+l  ~  Tl^_TO)P}ti(7l^_^) 

J=1 


(15) 


(16) 


Note  that  P{i/  -  m  + 1)  is  symmetric  and  the  following  three  rules  are  applied  to  differentiate  Sm+i 
above. 


-^{x^Qx)  =  2Qx 

=  Qy 

j^ii^Qy)  =  Q^i 

s  s 

sit(nli^tr)  ~  Lemma  1  and  Lemma  2  given  later  we  can  obtain  u°{n^^jTi')  which 

minimizes  5^+1  and  thus  obtain  5^+1 . 


-  -■ {Pj^_™+j-n.y_„P(i^  -  571+  l)Pn^_„+I-nv-m  (17) 

Tip— m+1  “Tip— TTi  *“1 

+  BjQBj  +  -  n^-m)R)~^ 

i=i 

^  T1j^«7„^3  "“TIi/^TTj,  — 1 

{Pn_„+,-n._^P(i^  -  *  PJQ  Aj}l(n^_„ ) 

J=1 

=  -ir(:/ -  m)r(n^_^) 

where  K{y  —  m)  is  defined  in  (  17). 


Therefore,  we  can  write 


m4i m  )  "h  — n»,— t7i  ^  (^i/— yn)  —  (^^) 

m+l  — 7X»/-m  -^Tliy-m+l  —  Tli/^rn  ^)]^(^t/“m) 

If  we  use  (17)  and  (  18),  we  have 

*^m+l  "“  {[-^ni,_Tn43  {y  ~~  ~  771  +  1 )  (l9) 

m  — nv— m  m  )} 

+  X  (7lt/^rn)Q^(^i/~m) 

n-i/— m43  “Hp—ni 

+  H  {[^i  -  -BjA'iiy  -  m)]x(n,,.^)}^Q{lAj  -  BjK(u  -  m)]x{n^-rn)] 

j=i 

+  (”•1— m+l  -  ni,^rn)[K{u  -  m)x{ni,^rn)]^R[K{t/  -  7n)l(Tl^_TO)] 

This  equation  can  be  rewritten  as 


-  m)]^P(l/  -  m  +  1)  (20) 

+  Q 

Tli»— r7»43  m  "“1 

+  Y1  l^J  “  (^  “  Tn)]^(5[Aj  -  5jir(i/  -  m)] 

j=i 

+  (^I/-Tn+1  -  n^~m)K^{nu~m)RR{i'  -  m))x{n^^rn)- 

=  X^{n^^rn)P{l'  -  77l)l(n^_„) 

where  P{v  -  m)  is  obtained  from  K(i/-m)  ajid  P(i/  -  7n+  1)  as  in  (  20).  Also  note  that  knowing 
P{i'  —  m  +  1)  is  enough  to  compute  K{v  —  m)  because  other  terms  of  (  17)  are  known  a  priori. 

Therefore,  we  find  a  symmetric  matrix  P{v-Tn)  satisfying  5^+,  =  x'^{n^_^)P{v-m)x{n^.m.)- 
From  (  17)  and  (  20),  we  have  the  following  recursive  equations  for  obtaining  P{u  -  m)  from 
P{u  -  m  d-  1)  where  m  =  1,2, ...,  v. 


K  {v  —  m) 


m43  “*1 

+  Y1  "i"  -  n^-m)R}~^ 

i=i 

Tn43  T7J  “1 

-  m  +  1)A„_^^,_„_„  +  Bj QA^) 

i=i 


(22) 


P{i/  m)  -  -  m)]^P(i/  -  m  +  1) 

+  Q 

m+3  m  “1 

+  Y.  -  BjK{u  -  m)fQ\A^  -  BjK{v  -  m)] 

j=i 

+  (^i/-m+i  —  n^-rn^K^iv  —  m)RK{v  —  m) 

Also,  we  know  that  at  each  time  instant 

u°{n^_^)  =  -K{i/  -  Tn)x{n^_^)  (23) 

Hence,  with  P{i')  —  Q,  we  can  obtain  A  (t)  and  P{i)  for  i  =  i/  —  1,  i/  —  2,  ...,0  recursively  using 
(  21)  and  (  22).  At  each  time  instant  n,  A,  i  =  0,l,2,...,i/-  1  the  new  control  input  value  will  be 
obtained  using  (  23)  by  multiplying  K{i)  by  xim)  where  x{ni)  is  the  estimate  of  the  system  state 
at  n;A.  Also,  note  that  the  optimal  control  cost  is  =  22'(o)P(o)i(0)  where  P(0)  is 

found  from  the  above  procedure. 

To  prove  the  optimality  of  this  control  law  we  need  the  following  lemmas. 

Lemma  1  If  Q  is  positive  semi-definite  and  R  is  positive  definite,  then  P(z),  i  =  v.  i/“l.  z/— 2, 0, 

matrices  are  positive  semi-definite.  Hence,  P{i)s  are  symmetric  from  the  definition  of  a  positive 
semi-definite  matrix. 

Proof  Since  P{i/)  =  Q  ,  from  assumption  P{u)  is  positive  semi-definite.  Assume  that  for 
A:  =  z  -f  1,  P{k)  is  positive  semi-definite.  We  use  induction  to  prove  that  P(z)  is  semi-definite.  Note 
that  Q  is  positive  semi-definite  and  R  is  positive  definite.  From  (  22)  we  have 

•^(0  “  n,  ■“  Pnv+2 1) 

+  Q 

^*+3  — n,-— •! 

+  Y  -  SjK(i)fQ[Aj  -  BjKii)] 
j=i 

+  (n.+i  -  ni)K^{i)RKii) 


(24) 


Since  P{i  +  1)  and  Q  are  positive  semi-definite,  R  is  positive  definite,  and  {n{^i  —  m)  >  0,  it 
is  easy  to  verify  that  for  Vj/  €  :  y^Pii)y  >  0.  This  means  that  P{i)  is  positive  semi-definite. 

This  inductive  procedure  proves  the  lemma. 

Lemma  2  Given  the  inverse  matrix  in  (21)  always  exists. 

Proof  Let  V  =  P(i/  -  m  +  -n._™  +  ^ 

(^»/-m+i  -  From  Lemma  1,  P(i/  —  m  +  1)  is  positive  semi-defiinite.  Therefore,  Vy  €  : 

y^Vy  >  0  because  Q  is  positive  semi-definite,  R  is  positive  definite  and  -  n^-m  >  0.  This 

implies  that  V  is  positive  definite.  Hence  the  inverse  matrix  exists. 

Theorem  1  Given  R  (t)  =  0, 1, 2, ...,  j/— 1 )  obtained  from,  the  above  procedure  are  the  optimal 

feedback  gains  which  minimize  the  cost  function  Jj^/j  (and  fj^)  on  [0,il4'A]. 

Proof  Note  that  given  Jm  is  a  convex  function  of  u(ni),  i  =  0, 1, ...,  i/  -  1.  Thus  the 
above  feedback  control  law  is  optimal. 

Lemma  ^  If  p  <  q  and  Dp  C  Dq  ,  then  where  and  are  the  optimal  costs  of 

controls  which  change  controls  at  time  instants  in  Dp  and  Dg  respectively. 

Proof  Suppose  that  then,  in  controlling  the  system  with  Z?^,  if  we  do  not 

change  controls  at  time  instants  in  Dq  —  Dp  and  change  controls  at  time  instants  in  Dp  to  the  same 
control  inputs  that  were  exercised  to  get  with  Dp,  we  obtain  Jm,  which  is  equal  to  .  This 
contradicts  the  fact  that  is  the  minimum  cost  obtainable  with  Dq  since  we  have  found  Jm, 
which  is  equal  to  Jm^  and  therefore  less  than  Hence,  Jm^  > 

This  lemma  implies  that  if  we  do  not  take  computation  cost,  /i,  into  consideration,  then  the 
more  control  exercising  points,  the  better  the  controller  is  (less  cost).  With  the  computation  cost 
being  included  in  the  cost  function,  the  statement  above  is  no  longer  true.  Therefore  we  need  to 
search  for  an  optimal  D^  which  minimizes  the  cost  function  4-  The  following  sections  provide  a 
detailed  discussion  on  searching  for  such  an  optimal  solution.  Note  that  if  we  let  =  Dm  then 
the  optimal  temporal  control  law  is  the  same  as  the  traditional  linear  feedback  optimal  control  law. 


3.3  Optimal  Temporal  Control  Law  over  Space  with  1/  Given 

When  the  number  of  control  changing  points,  i/,  and  an  initial  system  state  i(0)  are  given,  we 
search  over  a  set  of  possible  D„s  and  u(^Di,^s  such  that  the  cost  function  Jj^/j  is  minimized.  This 
can  be  done  by  varying  t/  -  I  control  changing  time  instants,  t,-,  i  =  1,2,  1  (since  to  =  0) 

over  the  discrete  set,  =  {0,  A,  2A, . . . ,  (Af  —  1)A}  and  applying  the  technique  developed  in  the 
previous  section  for  each  given  Du.  Let  us  denote  such  a  Du  which  minimizes  Jm  D^.  Note 
that  when  1/  is  given,  minimizing  Jm  is  equivalent  to  minimizing  j'j^.  Since  both  Du  and  u{Du) 
are  control  variates,  to  be  able  to  find  a  global  optimal  solution,  either  an  exhaustive  search  or 
some  global  search  methods  like  Genetic  Algorithm  or  Simulated  Annealing  should  be  considered. 
Later  we  present  a  numerical  example  in  which  an  exhaustive  search  with  Steepest  Descent  Search 
method  is  used.  Searching  for  a  globally  optimal  solution  for  a  temporal  controller  calls  for  further 
research. 

3.4  Optimal  Temporal  Control  Law 

Assume  that  a  maximum  number  of  control  changing  points,  is  given.  By  varying  v  from 

1  to  i>rnax  ''^6  Can  find  Df,»  to  obtain  a  globally  optimal  temporal  controller  which  minimizes 
This  can  be  done  by  first  searching  for  D®  for  each  given  2/  and  then  comparing  the  cost  function 
J'm  =  vp  at  each  jD®,  v  =  1, 2, That  is,  let  =  a:^(0)P(OX0)  +  i///  where 

P(0)  is  calculated  at  as  in  the  previous  section.  Then  we  can  obtain  a  global  minimum  cost 
J'^  =  and  an  optimal  number  of  control  changes,  i/®,  at  which  =  j'^. 

3.5  Terminal  State  Constraints 

The  terminal  state  constraints  may  be  used  to  check  if  the  optimal  temporal  controller  with 
can  drive  the  system  state  to  a  permissible  final  state  within  a  given  time.  Let  Xj  be  a  set  of 
allowed  terminal  states,  if  z(n„)  £  A'/,  then  the  control  law  is  said  to  be  stable  in  terms  of  the 
terminal  state  constraints  and  not  stable  U  x{nu)  ^  A/.  If  the  globally  optimal  temporal  controller 
obtained  from  the  above  procedure  is  not  stable,  i/'  should  be  increased  until  a  stable  one  is  found. 
One  way  of  specifying  terminal  state  constraints  for  regulators  might  be  |  x{M)i  |<  c,-  where  x{M){ 
is  the  ith  element  of  x{M)  state  vector. 


3.6  Algorithm  to  Derive  an  Optimal  Temporal  Controller 


To  summarize  the  above  discussion,  we  provide  in  Figure  2  a  complete  algorithm  to  search  for  a 
globally  optimal  temporal  controller  under  the  assumption  that  the  initial  state  x(0)  is  given. 

In  the  algorithm,  a  neighbor  of  =  {tiqA,  tij  A,  njA, . . . ,  n^_i  A}  is  defined  to  be  any  member 
of  aset  =  {{noA,n'iA,...,n^_iA}  1  |  n)  -  n.- |  <  1,  i  =  1,2, . . .,  j/ -  1}. 

3.7  Optimal  Temporal  Controllers  over  an  Initial  State  Space 

Note  that  D®  might  become  different  if  a  new  initial  system  state  i(0)  is  used  instead  of  x(0)  when 
the  state  vector  is  in  where  m  >  2.  This  is  because  the  cost  function  Jm  =  x^(0)P(0)x(0) 

depends  on  x(0)  as  well  as  P(0).  Thus,  P®  is  dependent  on  the  initial  state  s(0).  However,  when 
m  =  1  it  can  be  shown  that  I>°  is  independent  of  any  initial  state.  To  see  this  let  x(0)  =  A:i(0)  €  'R} 
3-iid  P(0)  and  P(0)  be  the  optimed  matrices  with  initial  states  ®(0)  and  i(0),  respectively,  i.e., 


=  2(0)P(0)x(0) 

= i(o)P(o)i(o) 


From  the  optimality  of  P(0)  with  respect  to  x(0). 


x^(0)P(0)x(0)  >  x^(0)P(0)x(0) 
Multiplying  the  above  inequality  by  P  we  have 


fc^x^(0)P(0)x(0)  = 
> 


x^(0)P(0)x(0) 

lk2x^(0)P(0)x(0) 

x^(0)P(0)x(0) 


(25) 


(26) 


On  the  other  hand,  due  to  the  optimality  of  P(0)  we  have 

x^(0)P(0)x(0)  >  x^(0)P(0)x(0)  (27) 

Therefore,  P(0)  =  P(0).  This  implies  the  optimality  of  P(0)  and  i>®  for  any  initial  state 
x(0)  €  72’. 

Generally  speaking,  the  above  result  will  not  hold  for  m  >  2  cases.  However,  using  the  same 
argument  discussed  above  we  can  prove  that  for  any  initial  state  x(0)  =  A:i(0),  x(0)  and  x(0)  will 
have  the  same  D®  as  well  as  the  same  P(0). 


1/°  =  1 
J^  =  oo 

for  1/  =  1  to  Umax  { 

/*  Several  different  search  starting  points  * / 
for  t  =  1  to  NumlnitPtSu  { 

Du  = 

/*  Iterate  until  a  local  minimum  is  found  -  Steepest  Descent  Search  * / 
while  (MinimumFound  !=  True)  { 

Find  optimal  costs  for  neighboring  points  of  using  theorem  1 
a  Local  Minimum  at  Dp) 

then  { 

MinimumFound  =  True 
^Mp  ~  at  Dp  } 

else 


} 


Dp  =■  z.  neighbor  of  Dp  with  the  smallest 


^  } 
then  { 

i/°  =  2/ 


Figure  2:  Complete  algorithm  to  find  an  optimal  temporal  controller. 


4  Implementation 


To  implement  temporal  control,  we  need  to  calculate  and  store  K{i)  matrices  in  (  22)  and  use  them 
when  controlling  the  system  utilizing  (  23).  Note  that  in  traditional  optimal  linear  control  a  similar 
matrix  is  obtained  and  used  at  every  time  instant  in  Dm  to  generate  control  input  value.  While 
the  feedoack  gain  matrices  for  traditional  linear  optimal  controller  are  independent  of  initial  states, 
the  number  of  control  exercises,  v,  and  K(i)  matrices  are  dependent  on  initial  states  for  temporal 
control  systems.  But,  if  the  possible  set  of  initial  states  is  in  they  are  independent  of  the  initial 
states.  Effective  deployment  of  temporal  control  requires  that  we  know  the  range  of  initial  state 
values  and  generate  K{i)  matrices  for  each  group.  A  sensitivity  analysis  is  required  to  determine 
how  many  distinct  matrices  need  to  be  stored. 

In  order  to  implement  temporal  control  we  require  an  operating  system  that  supports  scheduling 
control  computations  at  specific  time  instants.  The  Maruti  system  developed  at  the  University  of 
Maryland  is  a  suitable  host  for  the  implementation  of  temporal  control  [10,  8,  7].  In  Maruti,  all 
executions  are  scheduled  in  time  and  the  time  of  execution  can  be  modified  dynamically,  if  so 
desired.  This  is  in  contrast  with  traditional  cyclic  executives  often  used  in  real-time  systems,  which 
have  a  fixed,  cyclic  operation  and  which  are  well  suited  only  for  the  sampled  data  control  systems 
operating  in  a  static  environment.  It  is  the  availability  of  the  system  such  as  Maruti  that  allows 
us  to  consider  the  notion  of  temporal  control,  in  which  time  becomes  an  emergent  property  of  the 
system. 


5  Example 

To  illustrate  the  advantages  of  a  temporal  control  scheme  let  us  consider  a  simple  example  of  rigid 
body  satellite  control  problem  [12].  The  system  state  equations  are  as  follows: 


s(fc+  1) 
y{k) 


0  1 

x{k)  + 

0 

-1  2 

0.00125 

1  1  j  21 

{k) 

u{k) 


where  k  represents  the  time  index  and  one  unit  of  time  is  the  discretized  subinterval  of  length 

A  =  0.05.  The  linear  quadratic  performance  index  in  (  5)  is  used  here  with  the  following 
parameters. 


Q 


1  0 
0  1 


Figure  3:  Optimal  Linear  Control  with  A  =  0.05. 


R  =  0.0001 
p.  =  0.02  k  0.01 
M  =  40 
A  =  0.05 


€i 

x{0) 


0.01,  i  =  l,2 
0.5 
0.5 


(28) 


The  objective  of  the  control  is  to  drive  the  satellite  to  the  zero  position  and  the  desired  goal 
state  is  xj  =  [0,  0]^.  The  terminal  state  constraint  is  |  2,(40)  |<  q  i=  1,2.  With  the  equal 
sampling  interval  A  =  0.05  and  M  =  40  the  optimal  linear  feedbag'  control  of  this  system  has  cost 
function  Jm  =  0.984678  (without  computational  cost)  and  =  1.784678  (with  computational 
cost)  and  is  shown  in  Figure  3.  The  terminal  state  constraint  is  satisfied  at  0.8sec. 

If  we  apply  the  temporal  control  scheme  presented  above  to  this  problem  with  p  =  0.02  we  find 
that  the  optimal  number  of  control  changes  for  this  example  is  3  and  Df  =  {0,2A,10A}  with  a 
cost  Jj^  =  1.08388.  Note  that  the  40  step  optimal  linear  feedback  controller  given  above  has  a  cost 
~  1*784678  when  computation  cost  is  considered.  Table  1  shows  how  this  optimal  controller 
is  obtained  when  we  set  Umax  =  7.  Figure  4(a)  shows  the  system  trajectory  when  this  three-step 
optimal  temporal  controller  is  used  to  control  the  system.  This  trajectory  satisfies  the  terminal 
state  constraint  at  0.8sec  as  well.  Also,  the  maximum  control  input  magnitudes,  |  u  \max,  in  both 


n 

Dt 

Cost(j]^)  with  n  =  0.02 

Cost(J^)  with  fx  =  0.01 

1 

{0} 

4.63089 +  /i  =  4.65089 

4.63089  +  /X  =  4.64089 

2 

{0,1} 

1.44603 +  2/X  =  1.48603 

1.44603 +  2/x=  1.46603 

3 

{0,2,10} 

1.02388  +  3/1  =  1.08388 

1.02388 +3/x=  1.05388 

4 

{0,2,9,11} 

1.02224 +  4/X  =  1.10224 

1.02224  +  4/x  =  1.06224 

5 

{0,1,3,8,11} 

0.996968 +5/X  =  1.096968 

0.996968 +  5/x=  1.046968 

6 

{0,1,3,8,11,24} 

0.996746 +  6/X  =  1.116746 

0.996746  +  6/x  =  1.056746 

7 

{0,1,3,8,11,23,25} 

0.996745  +  7/x  =  1.136745 

0.996745  +  7/x  =  1.066745 

Table  1:  Calculating  optimal  temporal  controllers. 

controllers  lie  within  the  same  bound  B  =  50,  which  may  be  another  constraint  on  control. 

The  optimal  temporal  controller  found  with  /x  =  0.01  has  v  =  b  and  I?|  =  {0,  A,3A,8A,  llA} 
with  a  cost  Jm  =  0.996968.  Note  that  this  cost  is  even  less  than  1.01269  which  is  obtained  from 
the  optimal  controller  with  equal  sampling  period  O.lsec  and  20  control  changes. 

If  we  change  control  values  only  at  three  time  instants  with  equal  sampling  period,  13M  = 
0.65sec,  the  total  cost  incurred  is  2.2823(without  computational  cost)  on  the  time  interval  [0,2]. 
The  cost  is  more  than  twice  that  of  our  optimal  temporal  controller  and  the  terminal  state  constraint 
is  not  satisfied  even  at  the  end  of  the  controlling  interval  of  2.0sec.  Figure  4(b)  clearly  shows  the 
advantages  of  using  an  optimal  temporal  controller  over  using  an  optimal  controller  of  equidistant 
samplings.  Their  performances  are  noticeably  different  though  both  of  them  are  changing  controls 
at  three  time  instants.  It  is  clear  that  the  optimal  temporal  control  with  three  control  changes 
performs  almost  the  same  as  40  step  linear  optimal  controller  does.  This  implies  that  enforcing  the 
constant  sampling  rate  throughout  the  entire  controlling  interval  may  simply  waste  computational 
power  which  otherwise  could  be  used  for  other  concurrent  controlling  tasks  in  critical  systems. 

Obtaining  I?|  for  this  example  was  simple  since  J40  has  only  one  minimum  over  the  entire  set 
of  possible  D3S  on  [0,40 A].  Figure  5(a)  and  Figure  5(b)  show  that  J40  has  only  one  local(global) 
minimum  at  I?|  =  {0,2A,  lOA).  We  got  this  optimal  by  doing  steepest  descent  search  with  the 
starting  point  £>3"*^  =  {0,  A,  lOA)  after  searching  for  only  three  points,  {0,  A,  lOA},  {0, 2A,  lOA), 
{0, 3A,  lOA}.  Also,  Figure  5(a)  shows  that  choosing  ni  has  greater  influence  on  the  total  cost  than 
nj  since  the  cost  varies  more  radically  along  the  nj  axis  in  the  figure.  This  means  that  the  initial 
stage  of  the  control  needs  more  attention  than  the  later  stage  in  this  linear  control  problem. 

But,  if  we  change  one  of  the  parameters  of  performance  index  function,  R,  from  0.0001  to  0.001 
we  get  two  local  minima  at  Dl  =  {0,A,2A}  and  £>3  =  {0,3A,  19A},  among  which  is  the 


Figure  4:  C 
{0,2A,10A] 
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Figure  6:  Costs  neax  Dl  and  Dl  with  R  =  0.001. 

optimal  one  with  less  cost.  Figure  6  shows  this  fact.  In  this  case  we  need  to  use  steepest  descent 
search  method  at  least  twice  with  different  search  starting  points  to  get  an  optimal  solution.  We 
implemented  this  steepest  descent  search  algorithm  in  Mathematica  and  used  it  to  generate  D°  for 
several  examples  by  varying  v.  For  our  examples  of  linear  time  invariant  system  control  problems 
the  number  of  local  minima  was  not  so  large  that  we  could  efficiently  apply  this  search  method 
just  a  few  times  with  different  initial  to  get  a  global  minimum  without  doing  an  exhaustive 

search  over  the  entire  D^,  space. 

6  Discussion 

Employing  the  temporal  control  methodologj^  in  concurrent  real-time  embedded  systems  will  have 
a  significant  impact  on  the  way  computational  resources  are  utilized  by  control  tasks.  A  minimal 
amount  of  control  computations  can  be  obtained  for  a  given  regulator  by  which  we  can  achieve 
almost  the  same  control  performance  compared  to  that  of  traditional  controller  with  equal  sampHng 
period.  This  sigmficantly  reduces  the  CPU  times  for  each  controlling  task  and  thus  increases  the 
number  of  real-time  control  functions  which  can  be  accommodated  concurrently  in  one  embedded 
system.  Particularly,  in  a  hierarchical  control  system  if  temporal  controllers  can  be  employed  for 
lower  level  controllers  the  higher  level  controllers  will  have  a  great  degree  of  flexibility  in  managing 
resource  usages  by  adjusting  computational  requirements  of  each  lower  level  controller.  For  example, 
in  emergency  situations  the  higher  level  controller  may  force  the  lower  level  controller  to  run  as 


infrequently  as  they  possibly  can  (thus  freeing  computational  resources  for  handling  the  emergency). 
In  contrast,  during  normal  operations  the  temporal  control  tasks  may  run  as  necessary,  and  the 
additional  computation  time  can  be  used  for  higher  level  functions  such  as  monitoring  and  planning, 
etc. 

In  addition,  the  method  developed  in  Section  3.2,  which  calculates  an  optimal  controDer  when 
control  changing  time  instants  are  given,  can  be  applied  to  the  case  in  which  the  control  computing 
time  instants  cannot  be  periodic.  For  example,  when  a  small  embedded  controller  is  used  to 
control  several  functions,  it  may  be  a  lot  better  to  design  a  temporal  controller  for  each  function 
such  that  the  required  computational  resources  are  appropriately  scheduled  while  retaining  the 
required  degree  of  control  for  each  function. 

7  Conclusion 

In  this  paper  we  proposed  a  temporal  control  technique  based  on  a  new  cost  function  which  takes 
into  account  computational  cost  as  well  as  state  and  input  cost.  In  this  scheme  new  control  input 
values  are  defined  at  time  instants  which  are  not  necessarily  regularly  spaced.  For  the  linear 
control  problem  we  showed  that  almost  the  same  quality  of  control  can  be  achieved  while  much  less 
computations  are  used  than  in  a  traditional  controller. 

The  proposed  formulation  of  temporal  control  is  likely  to  have  a  significant  impact  on  the 
way  concurrent  embedded  real-time  systems  are  designed.  In  hierarchical  control  environment, 
this  approach  is  likely  to  result  in  designs  which  are  significantly  more  efficient  and  flexible  than 
traditional  control  schemes.  As  it  uses  less  computational  resources,  the  lower  level  temporal 
controllers  will  make  the  resources  available  to  the  higher  level  controllers  without  compromising 
the  quality  of  control. 
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Abstract 

The  real-time  systems  differ  from  the  conventional  systems  in  that  every  task  in  the  real¬ 
time  system  has  a  timing  constraint.  Failure  to  execute  the  tasks  under  the  timing  constraints 
may  result  in  fatal  errors.  Sometimes,  it  may  be  impossible  to  execute  all  the  tasks  in  the  task 
set  under  their  timing  constraints.  Considering  a  system  with  limited  resources,  one  solution 
to  handle  the  overload  problem  is  to  reject  some  of  the  tasks  in  order  to  generate  a  feasible 
schedule  for  the  rest.  In  this  paper;  we  consider  the  problem  of  scheduling  a  set  of  tasks  without 
preemption  in  which  each  task  is  assigned  criticality  and  weight.  The  goal  is  to  generate  an 
optimal  schedule  such  that  all  of  the  critical  tasks  are  scheduled  and  then  the  non-critical  tasks 
are  included  so  that  the  weight  of  rejected  non-critical  tasks  is  minimized.  We  consider  the 
problem  of  finding  the  optimal  schedule  in  two  steps.  First,  we  select  a  permutation  sequence 
of  the  task  set.  Secondly,  a  pseudo-polynomial  algorithm  is  proposed  to  generate  an  optimal 
schedule  for  the  permutation  sequence.  If  the  global  optimal  is  desired,  all  permutation  sequences 
have  to  be  considered.  Instead,  we  propose  to  incorporate  the  simulated  annealing  technique  to 
deal  with  the  large  search  space.  Our  experimental  results  show  that  our  algorithm  is  able  to 
generate  near  optimal  schedules  for  the  task  sets  in  most  cases  while  considering  only  a  limited 
number  of  permutations. 
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1  Introduction 


Real-time  computer  systems  are  essential  for  all  embedded  applications,  such  as  robot  control,  flight 
control,  and  medical  instrumentation.  In  such  systems,  the  computer  is  required  to  support  the 
execution  of  applications  in  which  the  timing  constraints  of  the  tasks  are  specified  by  the  physical 
system  being  controlled.  The  correctness  of  the  system  depends  on  the  temporal  correctness  as 
well  as  the  functional  correctness  of  the  tasks.  Failure  to  satisfy  the  timing  constraints  can  incur 
fatal  errors.  How  to  schedule  the  tasks  so  that  their  timing  constraints  are  met  is  crucial  to  the 
proper  operation  of  a  real-time  system. 

As  an  example  of  an  embedded  system,  let  us  consider  the  air  defense  system  which  monitors 
an  air  space  continuously  using  radars.  Whenever  an  intruder  is  identified,  the  embedded  control 
system  characterizes  it  and  proceeds  to  initiate  the  responsive  action  in  a  timely  manner.  The 
temporal  constraints  for  this  phase  of  processing  are  different  depending  on  the  intruder,  whether 
It  is  a  missile,  a  fighter,  a  bomber,  a  dummy,  etc.  Such  a  system  is  designed  to  handle  a  number  of 
intruders  concurrently.  If  the  processing  requests  exceed  the  capacity  of  the  system,  we  expect  the 
system  to  handle  a  set  of  the  most  significant  intruders,  and  not  any  arbitrary  set  of  intruders.  This 
involves  rejecting  the  processing  of  some  real-time  tasks  based  on  their  importance.  In  this  paper, 
we  consider  the  problem  of  creating  a  schedule  for  a  set  of  tasks  such  that  all  critical  tasks  are 
scheduled,  and  then,  among  the  non-critical  tasks  we  select  those  which  can  be  scheduled  feasibly 
while  maximizing  the  sum  of  the  weights  of  selected  non-critical  tasks. 

As  all  systems  have  finite  resources,  their  ability  to  execute  a  set  of  tasks  while  meeting  the 
temporal  requirements  is  limited.  Clearly,  overload  conditions  may  arise  if  more  tasks  have  to  be 
processed  than  the  available  set  of  resources  can  handle.  Under  such  overload  conditions,  we  have 
two  dioices.  We  may  augment  the  resources  available,  or  reject  some  tasks  (or  both).  In  [8],  a 
technique  was  presented  to  handle  transient  overloads  by  taking  advantage  of  redundant  computing 
resources.  Another  permissible  solution  to  this  problem  is  to  reject  some  of  the  tasks  in  order  to 
generate  a  feasible  schedule  for  the  rest.  Once  a  task  is  accepted  by  the  system,  the  system  should 
be  able  to  finish  it  under  its  timing  constraint.  Some  algorithms  may  have  been  shown  to  perform 


wel]  under  low  or  moderate  resource  utilization.  However,  their  performance  degrades  if  the  system 
is  overloaded  [2].  For  example,  the  EDF  algorithm  has  been  shown  to  be  optimal  for  a  periodic  task 
set  [6].  If  there  exists  a  feasible  schedule  for  the  task  set,  EDF  can  come  up  with  one.  However, 
if  the  task  set  is  not  feasible,  EDF  may  perform  unsatisfactorily.  The  reason  is  that  a  task  with 
urgent  deadline  may  not  be  able  to  finish  before  its  deadline.  But,  due  to  its  urgent  deadline,  the 
task  has  a  high  priority  to  use  the  processor  and  thus  keeps  wasting  the  CPU  time  until  the  task 
expires  after  its  deadline.  The  waste  of  CPU  time  may  further  prevent  other  tasks  from  meeting 
their  deadlines.  The  other  problem  is  that  there  is  little  control  over  which  tasks  will  meet  their 
deadlines  and  which  will  not. 

For  an  overloaded  system,  how  to  select  tasks  for  rejection  on  the  basis  of  their  importance 
becomes  a  significant  issue.  When  the  tasks  have  equal  weight,  an  optimal  schedule  can  be  defined 
to  be  one  in  which  the  number  of  rejected  tasks  is  minimized.  In  our  previous  study  [3],  we  used  a 
super  sequence  based  scheduling  algorithm  to  compute  the  optimal  schedule  for  the  tasks.  In  this 
paper,  the  criticality  of  the  tasks  are  taken  into  consideration.  Basically,  if  a  task  can  not  meet 
its  deadline,  it  is  rejected  so  that  the  CPU  time  would  not  be  wasted.  Secondly,  we  would  like  to 
schedule  tasks  such  that  the  less  important  tasks  may  be  rejected  in  favor  of  the  more  important 
tasks.  We  classify  tasks  into  two  categories:  critical  and  non- critical.  The  critical  tasks  are  crucial 
to  the  system  such  that  they  must  not  be  rejected.  The  non-critical  tasks  are  given  weights  to 
reflect  their  importance,  and  are  allowed  to  be  rejected.  A  schedule  is  feasible  if  all  critical  tasks 
in  the  task  set  are  accepted  and  are  guaranteed  to  meet  their  timing  constraints.  If  there  exists 
no  feasible  schedule  for  the  task  set,  the  task  set  is  considered  infeasible.  The  loss  of  a  schedule  is 
defined-  to  be  the  sum  of  the  weights  of  the  rejected  non-critical  tasks.  A  schedule  is  optimal  if  it 
is  feasible  and  the  loss  of  the  schedule  is  minimum. 

We  first  propose  a  Permutation  Scheduling  Algorithm  (PSA)  to  generate  an  optimal  schedule 
for  a  permutation,  which  is  a  well  defined  ordering  of  tasks.  When  it  comes  to  scheduling  a  task  set 
of  n  tasks,  in  the  worst  case  there  might  be  up  to  n!  permutations  to  consider.  We  propose  a  Set 
Scheduling  Algorithm  (SSA)  which  incorporates  the  simulated  annealing  technique  [9]  to  deal  with 
the  large  search  space  of  permutations.  PSA  is  invoked  by  SSA  to  compute  the  optimal  schedule  for 


each  permutation.  Taking  the  feedback  from  the  schedulability  and  loss  of  the  schedule  generated 
by  PSA,  SSA  is  able  to  control  the  progress  of  search  for  an  optimal  schedule  for  the  task  set.  Our 
experimental  results  show  that  SSA  is  able  to  generate  feasible  schedules  for  task  sets  consisting  of 
100  tasks  with  success  ratios  no  less  than  98%  and  loss  ratios  less  than  10%  for  most  cases  while 
searching  less  than  5,000  permutations.  For  each  permutation,  the  average  number  of  schedules 
computed  to  generate  an  optimal  schedule  by  PSA,  which  is  invoked  by  SSA,  is  usually  less  than 
500.  The  SSA  algorithm  can  be  considered  efficient  in  dealing  with  the  exponential  search  space 
for  coming  up  with  a  satisfactorily  near  optimal  schedule. 

In  the  foUowing  section,  we  define  the  scheduling  problem.  In  section  3,  we  present  the  idea 
about  how  to  schedule  a  permutation.  In  section  4,  we  incorporate  the  technique  of  simulated 
annealing  and  discuss  how  to  schedule  a  task  set.  In  section  5,  the  results  of  our  experiments  are 
presented,  which  is  followed  by  our  conclusion. 

2  The  Problem 

A  task  set  is  represented  as  P  =  {tj, r2, A  task  r,-  can  be  characterized  as  a  record  of 
(r,*,  c,-,  d,*,  u;,'),  representing  the  ready  time,  computation  time,  deadline,  and  criticality  of  the  ith 
task.  Time  is  expressed  as  a  real  number.  A  task  can  not  be  started  before  its  ready  time.  Once 
started,  the  task  must  use  the  processor  without  preemption  for  c,-  time  units,  and  be  finished 
by  its  deadline.  If  a  task  is  very  important  for  the  system  such  that  rejection  of  the  task  is  not 
allowed,  W{  is  set  to  be  CRITICAL.  Otherwise,  Wi  is  assigned  an  integral  value  to  indicate  its 
importance,  and  is  subject  to  rejection  if  necessary.  A  permuiaiion  sequence^  or  simply  abbreviated 
to  a  permuiaiion^  is  an  ordered  sequence  of  ta^ks  in  the  task  set.  Scheduling  is  a  process  of  binding 
starting  times  to  the  tasks  such  that  each  task  executes  according  to  the  schedule.  Note  that  a 
non-preemptive  schedule  on  a  single  processor  implies  a  sequence  for  the  execution  of  tasks.  For  the 
convenience  of  our  discussion,  we  hereafter  use  a  sequence  to  represent  the  schedule  in  the  context, 
A  permutation  is  denoted  by  =  (^i>  •  •  where  r,*  is  the  zth  task  in  the  permutation.  A  prefix 
of  a  permutation  is  denoted  by  =  (^i, . . . ,  r^). 
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To  schedule  a  ta^lc  set,  we  need  to  take  into  consideration  the  possible  permutations  in  the  ta^k 
set.  We  first  consider  an  algorithm  for  scheduling  a  permutation.  The  finish  time  of  a  schedule  is 
the  finish  time  of  the  last  task  in  the  schedule.  Let  5jt(t)  denote  a  schedule  of  fXk  with  finish  time 
no  more  than  t.  We  use  W{Sk{i))  to  represent  the  weight  of  Sk{t),  which  is  the  sum  of  the  weights 
of  non-critical  tasks  in  the  schedule.  A  feasible  schedule  of  /Xjt  is  defined  as  follows: 

Definition:  5jt(t),  1  <  fc  <  n,  is  a  feasible  schedule  of  fik  at  t,  if  and  only  if: 

1.  Sk{t)  is  a  subsequence  of  fik, 

2.  the  finish  time  of  Sk{i)  is  less  than  or  equal  to  t,  and 

3.  all  critical  tasks  in  fi^  are  included  in  Sk{t). 

An  optimal  schedule  of  fik  is  defined  as  follows: 

Definition:  ak{t)  is  an  optimal  schedule  of  pk  at  t,  if  and  only  if: 

1.  ak{i)  is  a  feasible  schedule  of  pk,  and 

2.  for  any  feasible  schedule  Sk{i)  of  pk,  >  W(Sfc(t)). 

In  other  words,  an  optimal  schedule  is  a  feasible  schedule  with  minimum  loss.  There  are  possibly 
more  than  one  optimal  schedules  for  pk  with  finish  time  less  than  or  equal  to  t.  We  donote  by 
Sfc(t)  the  sei  of  all  of  the  optimal  schedules  for  pk  at  i.  Hence,  if  Skit)  €  Sk{t)  is  an  optimal 

schedule  for  pk  at  t. 

The  scheduling  problem  considered  here  is  NP-complete.  To  prove  that,  its  related  decision 
problem,  which  is  defined  to  be  computing  a  feasible  schedule  with  loss  no  more  than  a  given 
bound,  can  be  easily  shown  to  be  NP-complete.  This  can  be  done  by  restricting  to  PARTITION 
problem  [1]  by  setting  r,-  =  0,  =  c,-,  d{  =  i  Cj,  for  1  <  t  <  n. 

3  Scheduling  a  Permutation 

We  consider  the  problem  of  finding  an  optimal  schedule  for  the  task  set  in  two  steps  —  select  a 
permutation,  and  find  an  optimal  schedule  for  the  permutation.  The  methodology  is  presented  in 
Figure  1. 
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Loop  1:  Choose  a  permutation  /z  of  T 
Loop  2:  for  /zjt,  /:  =  1,2, . . n 

Loop  3:  compute  crk{i) 

Figure  1:  Methodology 

Clearly,  to  find  the  optimal  schedule  for  the  task  set,  all  possible  permutations  have  to  be 
considered.  How  to  search  the  permutations  will  be  addressed  in  section  4.  In  Loop  3,  optimal 
schedules  for  y-k  are  computed  at  some  time  instants.  Next,  we  discuss  how  to  compute  ak{i)  for  a 
given  t  in  the  following,  and  then  discuss  how  to  determine  the  time  instants  for  yk- 

3.1  Computing  ajt(f) 

We  use  dynamic  programming  to  compute  ck{i)  based  on  with  t'  <  t.  The  criticality  of 

plsys  an  important  role  in  computing  o'fc(t). 

If  Tfc  is  a  critical  task,  we  have  to  schedule  it,  possibly  at  the  cost  of  rejecting  some  of  the 
non-critical  tasks.  Hence,  o'fc(t)  =  Sk-i{t')  ©  Tk,  for  some  schedule  Sk-i(t'),  where  ©  means 
concatenation  of  the  sequence  and  the  task.  The  finish  time  of  Sk—iit')  must  be  no  more  than 
i  —  Ck  in  order  to  accommodate  r^,  which  leads  to  t'  <  t  —  Ck-  The  best  candidate  could  be 
ak-i{t-Ck).  Hence, 

^k{t)  =  ak-i{i  -  cjk)  @Tk,  (1) 

which  can  be  seen  in  Figure  2.  Note  that  o'i(t)  only  exists  for  a  proper  range  of  t.  That  is,  Ck{i)  is 
infeasible  when  t  is  beyond  the  proper  range,  e.g.,  t  <  +  ct,  or  if  Ck-i{t  —  cjt)  is  infeasible.  The 

range  would  be  considered  in  details  later. 

If  Ik  is  non-critical,  our  concern  is  to  obtain  as  large  a  weight  for  the  schedule  as  possible,  while 
the  critical  tasks  accepted  previously  must  be  kept  in  the  schedule.  Computation  of  crk{i)  is  based 


(Tk-l(i-Ck) 


Figure  2:  Scheduling  for  Tk 


upon  the  choice  between  either  including  Tk  or  not.  That  is, 


C7k{i)  = 


— Ck)  @Tk  or 
c'k-iii) 


(2) 


which  can  be  seen  in  Figure  2.  The  factors  for  making  the  choice  are  the  feasibility  and  the  weights 
of  the  two  candidate  schedules.  That  is,  the  chosen  schedule  has  to  be  feasible  in  the  first  place, 
and  has  a  weight  more  than  or  equal  to  the  other. 


3.2  Time  Instants  for  Computing  Ok{i) 

From  Equations  1  and  2,  the  computation  of  ajt(t)  is  based  on  the  results  of  ak--i{i)  and  ak-i{i-Ck). 
We  do  not  need  to  look  for  all  possible  values  for  t.  We  can  get  the  idea  about  how  to  determine  the 
time  instants  t  by  a  simple  example  in  Figure  3.  The  ready  times,  computation  times,  deadlines, 
and  weights  are  given  to  the  tasks  in  ps  =  (tj, ra, T3). 

The  following  schedules  for  ^3  can  be  easily  verified. 


^3(0  =  INFEASIBLE 

^^3(0  =  (rs) 
a3(t)  =  {t2,  T3.) 

«^3(i)  =  (ri,r3) 


for  t  <  6 
for  6  <  t  <  7.5 
for  7.5  <  t  <  9 
for  9  <  i 


VF(a3(t))  =  0 
W(a3(0)  =  5 
iy(a3(i))  =  10 


12 

I  =  10 


W2  =  5 


W3  =  CRITICAL 


In  general,  there  exist  a  number  of  subranges  in  each  of  which  the  schedules  are  exactly  identical, 
which  are  illustrated  in  Figure  4.  We  only  need  to  compute  the  schedules  at  the  time  instants 
which  delimit  the  subranges,  i.e.,  6,7.5,  and  9.  We  call  these  time  instants  scheduling  points.  The 
scheduling  points  can  be  determined  by  the  timing  characteristics  of  the  tasks. 


0  6  7.5  9  12 


Figure  4:  Identical  subranges 
3.3  Definition  of  Scheduling  Points 

We  denote  the  jth  scheduling  point  for  nk  by  Xkj,  and  call  j  the  index  of  A^j.  Hence,  ak{Xk,j)  de¬ 
notes  an  optimal  schedule  for  pk  s-t  the  scheduling  point  Xkj.  Let  Vk  be  the  total  number  of  schedul¬ 
ing  points  at  which  we  need  to  schedule  fik.  For  simplicity,  A^  denotes  the  set  of  Xk,ii  ^k,7,  •  •  • ,  Afc,„* , 
and  Ck  the  set  of  c^k{^k,i)',c^k{^k,2),  •  •  -,o-*(A;t,w*)-  The  scheduling  points  are  defined  as  follows. 

Definition;  The  set  of  scheduling  points,  A;^,  is  complete  if  and  only  if: 

1.  for  any  t  <  A/t,i,  is  empty, 

2.  for  any  Xkj  <  t  <  Xkj+u  for  j  =  1, . .  -  1,  CTk{Xkj)  €  l^kii),  and 

3.  for  any  t  >  Ajt.v*,  crt(Ai,„J  € 

Note  that  Sfc(t)  being  empty  means  that  there  is  no  feasible  schedule  with  finish  time  less 
than  or  equal  to  t.  And  also  remember  that  (^k{^k,j)  €  means  that  c^k{^k,j)  is  an  optimal 
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Figure  3:  pz  =  (tj, 73,73) 


schedule  for  /ut  at  i.  The  completeness  of  scheduling  points  indicates  that  all  of  the  optimal 
schedules  at  the  positive  real  time  domain  can  be  represented  by  the  optimal  schedules  computed 
at  the  scheduling  points.  In  addition,  the  set  of  scheduling  points,  Ajt,  is  minimum,  if  and  only  if 
<  iy(cri(Ax-j+i)),  for  any  I  <  j  <  vk  —  1.  This  ensures  that  there  does  not  exist  any 
redundant  scheduling  point  which,  if  removed,  does  not  violate  the  completeness  of  the  scheduling 
points.  The  sets  of  scheduling  points  that  we  will  discuss  are  complete  and  minimum. 


3.4  An  Example  for  Deriving  Scheduling  Points 


The  values  of  A;-  depend  on  the  temporal  relations  between  t/c  and  Ajfe_i.  The  example  in  Figure  5 
is  used  to  illustrate  the  relations.  We  only  describe  the  idea  of  deriving  scheduling  points  by  the 
example,  and  will  discuss  in  more  details  later.  Assume  that  there  are  5  scheduling  points  for  ftk-i , 
and  we  consider  to  compute  based  on  cr^-i .  The  current  task,  r^t,  may  be  critical  or  non-critical. 

scheduling  points  for  pLk-i  : 


^k-1,5 


scheduling  points  for  ^Lk  :  +  Ck  Xk-1,2  +  Ck  Xk-1,3  +  Ck 


time 


Figure  5:  Scheduling  Points 


'  First,  let  us  assume  that  Tk  is  critical,  which  means  that  rjt  must  be  the  last  task  in  any  feasible 
schedules  for  fik-  A  schedule  for  Hk  is  thus  a  schedule  for  iik-\  concatenated  by  Tk-  Hence,  the 
optimal  schedules  for  fXk  can  be  computed  by  appending  Tk  to  Ck-i{j),  j  =  1, One 
restriction  is  that  Tk  must  be  able  to  execute  during  its  time  window,  from  rjt  to  dk-  Hence,  the 
scheduling  points  axe  Xk-\j  +  Cjt,  j  =  1, subject  to  the  timing  constraint  of  Tk-  In  the 
example,  because  Tk  >  Xk-i,i,  the  first  scheduling  point  is  A^.i  =  Tk  +  Ck-  The  first  and  the  rest 
scheduling  points  are  expressed  in  Equations  3-5.  Notice  that  Xk-\,4  +  Ck  >  dk-  Hence,  there  axe 


only  3  scheduling  points  for  p.k- 


Ajt.l  =  Tk-b  Ck 

and  cTk{Xk,i)  =  ai_i(Ajt-i,i)  0  Tk 

(3) 

Xk,2  =  Ai_i,2  +  Ck 

and  (Jk{Xk,2)  =  o^A:-i(A*-i,2)  ©  Tk 

(4) 

Xk,3  =  Ajk_i,3  +  Ck 

and  aA:(A;,,3)  =  c7k-i{Xk-i,z)  ©  n 

(5) 

On  the  other  hand,  let  us  assume  that  is  non-critical.  As  a  non-critical  task,  Tk  is  not  necessarily 
included  in  the  schedule  of  /zjt-  Whether  to  include  Tk  or  not  depends  on  how  much  weight  may  be 
gained  by  including  Tk-  If  Tk  is  included  in  the  schedules,  the  new  possible  scheduling  points  for  fik 
are  expressed  in  Equations  6-8. 

K,i  =  T^k  +  Ck  and  cT'k{>^'k,i)  =  (^k-i{>^k-i,i)  ©  n  (6) 

^'k,7  —  ^k-i,7 c-k  and  2)  =  ©  Tfc  (7) 

A'i,3  =  At-1.3  +  Ck  and  cr[.(A';.  3)  =  ajt-i(Ajt_i,3)  ©  rk  (8) 

If  Tk  is  not  included,  the  scheduling  points  for  fik  are  Xk-i,ji  J  =  1?  •  •  •  >  The  scheduling  points 
for  fik  can  be  derived  by,  first,  merging  and  sorting  Aj^  and  A^t-i,  which  gives 

Ajt-l.l ,  A^_i,2>  AJ-  j  ,  A;i:-1,3>  Ajfe_i,4,  ^k-i,s-  (9) 

Then,  the  resultant  array  of  scheduling  points  should  follow  the  rule  that  the  weights  of  the  optimal 
schedules  at  the  scheduling  points  in  the  resultant  array  in  Equation  9  should  be  strictly  increasing. 
We  remove  any  scheduling  point  if  necessary. 

3.5  Deriving  Scheduling  Points 

By  the  example  illustrated  in  Figure  5,  Xk  can  be  derived  from  At_i  and  r^.  Note  that  a  scheduling 
point  indicates  the  finish  time  of  a  schedule.  If  we  want  to  append  rjt  to  a;c_i(A*_i  j),  Tk  can  not  be 
started  before  Xk-ij.  This  implies  that  A^  can  be  determined  by  the  temporal  relations  between 
Ai.i,  the  finish  times  of  Ck,  and  the  start  time  of  Tk-  Specifically,  we  need  to  explore  the  temporal 
relations  between  the  earliest  start  time,  Vk,  the  latest  start  time,  d*  —  Ck,  of  Tk,  and  the  lower  and 


Tipper  bounds  to  be  defined  below.  We  define  the  lower  bound  Lk-i  =  and  the  upper  bound 

Uk-i  =  -  particular,  they  have  the  following  meanings. 

Lk-\-  the  largest  time  instant  such  that  there  is  no  feasible  schedule  for  /ijt-i  with 
finish  time  less  than  Lk~i . 

Uk-i-  the  least  time  instant  such  that  the  optimal  schedule  for  /ijt— i  with  finish  time 
greater  than  Uk-i  can  be  ). 

The  six  possible  temporal  relations  in  Equations  10-15  can  be  used  to  determine  Ajt. 


dk  -  Ck  <  Lk-\  <  Uk-i  (10) 

n-  <  Lk-i  <  dk  -  Ck  <  Uk-i  (11) 

Lk-i  <  Tk  <  dk  -  Ck  <  Uk-i  (12) 

Tk  <  Lk-i  <  Vk-i  <dk-Ck  (13) 

Lk-i  <  Tk  <  Uk-i  <  dk  -  Ck  (14) 

Lk-i  <  Uk-i  <  Tk  (15) 


The  temporal  relations  are  illustrated  in  Figure  6,  and  can  be  summarized  in  three  cases.  The 
method  for  constructing  scheouling  points  according  lo  the  temporal  relations  is  discussed  next. 
The  correctness  of  the  method,  i.e.,  the  completeness  and  minimization  of  the  scheduling  points  , 
is  verified  later. 

3.5.1  Tk  is  Critical 

The  task  Tk  must  be  the  last  task  in  any  feasible  schedule  of  /xk-  Remember  that  Ck{,t)  can  be 
computed  by  Equation  1.  In  the  following,  we  discuss  how  to  derive  the  scheduling  points  for  the 
three  cases.  The  readers  may  refer  to  the  algorithm  in  section  3.7  for  details. 

Case  1  —  c*  <  Lk-i :  p.  is  not  feasible.  Remember  that  there  exists  no  feasible  schedule  for 

Pk  with  finish  time  less  than  X-fc— i,  due  to  the  completeness  of  scheduling  points,  and  that  dk  —  Ck 
is  the  latest  start  time  for  .  Hence,  pk  is  not  feasible,  and  thus  the  whole  permutation,  p,  is  not 
feasible. 
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Figure  6;  Temporal  relations 

Case  2  (r^  <  Lk-i  <  dk  —  Ck)  or  {Lk~i  <  r*  <  Uk~i)  :  The  scheduling  points  for  fik  is  the 
set  of  ^k-i,j  +  Cfc,  j  =  1, .  ..,Vk-i,  subject  to  the  constraints  that  Tk  must  start  after  r^,  and  finish 
before  dk-  Specifically,  Xk  can  be  derived  by  Equations  16  and  17. 

Xk,i  =  max{Xk-i,i  +  Ck,  +  Ck)  (16) 

Let  Jmin  a-nd  Jmax  denote  the  smallest  and  the  largest  integers  ofy  satisfying  Xk,i  <  Xk-ij+Ck  <  dk- 
The  rest  of  the  scheduling  points  can  be  computed  by 

—  ^k—l,j  +  ^k,  where  Jmin  ^  ^  Jmax  i  —  j  —  +  2  (17) 

Note  that  Vk  =  Jmax  ~  Jmin  +  2.  The  example  given  in  Figure  5  falls  in  this  case. 

Case  3  Uk-\  <  Tk'.  there  is  only  one  scheduling  point.  Since  Tk  is  the  earliest  start  time  for  Tk, 
the  ‘only  scheduling  point  is  rjt  +  cjt. 

3.5.2  Tk  is  Non-critical 

Remember  that  Ck{t)  can  be  computed  by  Equation  2.  The  non-critical  task  Tk  is  not  necessarily 
included  in  the  schedule  for  fik-  Whether  to  include  Tk  or  not  depends  on  how  much  weight  may 
be  gained  by  including  r*.  Let  us  consider  the  three  cases. 


Case  1  dk  —  Ck  <  Lk-\'  do  nothing.  The  latest  start  time  of  is  less  than  the  lower  bound, 
Lk-\ ;  hence,  Tk  can  not  be  included  in  any  feasible  schedule.  The  scheduling  points  and  schedules 
for  }ik-\  remain  the  same  as  the  scheduling  points  and  schedules  for  ^ik-  In  our  implementation, 
to  save  time  and  space,  \k-\  and  Ajt  use  the  same  memory  spaces;  also,  Ck-i  and  Ok  use  the  same 
memory  spaces.  So  now  Ajt  =  Ajt_i  and  Ok  —  c^k-i- 

Case  2  (r^.  <  Lk-i  <  dk  —  Ck)  or  {Lk-i  <  Tk  <  Uk-i)  :  If  Tk  is  included,  the  new  possible 
scheduling  points  for  pk  is  the  set  of  ^k-ij  +  cjt,  J  =  1,  •  • . ,  Vk-i,  subject  to  the  constraints  that  Tk 
must  start  after  Tk,  and  finish  before  dk-  SpecificaUy,  the  new  possible  scheduling  points  ,  A]^,  can 
be  derived  by  Equations  18  and  19. 

A'i  1  =  max(Xk-i,k  +  c^,  Tk  +  Ck)  (18) 

Let  «^min  aJid  Jmax  denote  the  smallest  and  the  largest  integers  of  j  satisfying  A)^  j  <  Xk-ij+Ck  <  dk. 
The  rest  of  the  scheduling  points  are 

+  Ck,  wheTe  Jmin  <j<  Jmax  and  i  =  j-  Jrnir,  +  2  (19) 

If  Tk  is  not  included,  the  scheduling  points  for  pk  are  the  old  ones  for  fik-i",  i-6., 

i  =  l,...,Ufc_3.  (20) 

It  is  worth  mentioning  that  some  optimal  schedules  may  include  Tk,  and  some  may  not.  The 
scheduling  points,  Ajt,  can  be  derived  by  the  following  two  steps. 

1.  Merge  and  sort  the  two  arrays  of  scheduling  points,  and  Xk-\,  in  Equations  18-20. 

2.  The  resultant  array  of  scheduling  points  should  follow  the  rule  that  the  weights  of  the  optimal 
schedules  at  the  scheduling  points  should  be  strictly  increasing.  We  remove  any  scheduling 
point  that  has  a  smaller  weight  than  that  of  its  preceding  scheduling  point  in  the  array. 

The  example  given  in  Figure  5  falls  in  this  case. 

Case  3  Uk-\  <  Tk'.  add  one  more  scheduling  point.  The  earliest  start  time  of  Tk  is  greater 
than  the  upper  bound,  Uk-i;  hence,  the  new  scheduling  point  is  +  Ck.  The  weight  of  the 
optimal  schedule  computed  at  this  scheduling  point  is  IF(o’;i_i(Ajt-i,v*_i))+u);t>  which  is  larger  than 


W(<T;:_i(Ajt_i,„*_, )).  So  this  scheduling  point  must  be  included  to  make  the  set  of  scheduling  points 
for  Ilk  complete.  Note  again  that  the  scheduling  points  and  schedules  for  iik-i  remain  unchanged 
as  the  scheduling  points  and  schedules  for  fik]  i.e.,  Xkj  =  ><k-i,j  and  crk{\k,j)  =  crk-\{Xk-i,j),  for 
j  =  l,...,Vk-i.  However,  Xk,v^  =  r/t  +  cjt  and  Ck{Xk,vJ  =  where  Vk  =  + 

3.6  Completeness  and  Minimization  of  Scheduling  Points 

We  would  like  to  show  that  the  sets  of  scheduling  points  derived  in  the  three  cases  are  complete 
and  minimum.  Note  that  cases  1  and  3  are  special  cases,  and  are  not  difficult  to  verify.  Hence,  we 
wiU  only  briefly  discuss  case  2.  If  Tk  is  critical,  we  would  like  to  show  that  If  A^-_i  is  complete  and 
minimum,  Xk  derived  by  Equations  16  and  17  is  also  complete  and  minimum. 

Condition  1  of  completeness:  Due  to  the  completeness  of  Xk-i,  is  empty  when  i  < 

X.k~i,-i.  Equivalently,  -  Ck)  is  empty  when  t  <  Xk-i,i  +  Ck-  According  to  Equation  1, 

(Tk(i)  =  crk-i(t  —  Ck)  ©  Tk-  Hence,  ak(i)  does  not  exist  when  i  <  Xk-i,i  +  c*.  On  the  other  hand, 
since  rjt  is  critical,  crk(i)  does  not  exist  when  i  <  rk  +  Ck,  which  is  the  earliest  finish  time  of 
Tk.  Therefore,  Sjt(t)  is  empty  when  i  <  A^-j.  This  shows  that  condition  1  of  the  definition  of 
completeness  is  satisfied. 

Condition  2  of  completeness:  Due  to  the  completeness  of  Afc_i,  crfc_i(Ai_ij)  G  for  any 

Afc-i,j  <  t  <  Xk-i,j+i.  By  Equation  1,  ai_i(Ajt_i,j)  ©  is  an  optimal  schedule  at  Xk-ij  +  Ck 
for  Ilk-  Hence,  cri_i(Afc_ij)  ®  Tk  £  Efc(t),  for  Xk-ij  -t  Ck  <  t  <  Ai_ij+i  +  Ck-  By  Equation  17, 
Xk,i  =  +  Ck,  for  i  =  j  -  +  2,  which  indicates  that  ak{Xk,i)  =  Ok-\{Xk-ij)  ©  r*.  Besides, 

Afc,;+i  =  A;t_i  j+i  +  Ck,  for  i  +  1  =  J  +  1  -  +  2,  by  Equation  17.  Therefore,  crjt(At,,)  €  Dfc(t), 

for  Xk^(  <  t  <  A;t.i+i  •  This  shows  that  condition  2  of  the  definition  of  completeness  is  satisfied. 

Condition  3  of  completeness:  We  know  that  Vk  =  Jmax  —  Jmin  +  2.  By  Equation  17,  = 

+  Ck,  which  indicates  that  afc(Ai,t,*)  =  completeness 

of  Xk-i,  Ok-i{Xk-i,j^,,)  e  2fc_i(t),  for  Afc_i,j„„  <  t  <  Ai_i,j„„+i,  or  just  Xk-i,j^,,  <  i  if 
Jmax  =  Vk-i-  By  Equation  1,  Ok-i{^k-i,Jm„x)  ®  ’’’k  is  an  optimal  schedule  at  At_i,j„,„,  +  cjt 
for  Ilk.  Hence,  Ok-i{Xk-i,j^^^)  ©  r*  €  Djb(t),  for  Ajt_i,j,„„  +  Ck  <  i.  Note  that  the  range  of 


i  <  +  is  removed.  Because  J^ax  is  the  largest  integer  o{  j  satisfying  Ajt_ij  +  Ci  <  dk, 

the  schedule  would  not  be  feaisible.  Since  Ojt(Ajt,v*)  =  ^k-i{^k-i,Jmax)  ®  '’’k, 

(^k{^k,vt:)  €  Sfc(t)  for  ^k,vi,  £  This  shows  that  condition  3  of  the  definition  of  completeness  is 
satisfied. 

Minimization:  By  Equation  1,  W{ak(i))  =  W{ak-iit  -  Ck)  ©  rjt)  =  W(cTk-iit  -  Ck)),  since  a 
critical  task  has  no  weight.  Because  Ajt_i  is  minimum,  W{ak-iiXk-i,j))  <  iy(ai_i(Ai_ij+i)), 
for  any  1  <  j  <  Vk-i  -  1.  That  is,  W{ok-i{Xk-i.j)  ®  <  Wiak-i{Xk--,j+,)  ©  r^),  for  any 

1  <  i  <  Vk-t  -  1.  By  Equations  16  and  17,  W{ak{Xk-ij  +  Ck))  <  W{ak{Xk-i,j+i  +  cjt)),  and  thus 
^Viak{Xk,i))  <  H^(a,(A,  ,i‘+i))3  for  any  I  ^  i  <  —  1.  This  shows  that  Xj^  is  minimum. 

If  T/c  is  non-criticaJ,  may  be  included  or  not  included  in  the  optimal  schedules  for  Assuming 
that  r/:  is  not  included  in  any  of  the  optimal  schedules,  A/,  =  A/,^i  is  complete,  since  Xk-i  is 
complete.  However,  including  rjt  may  gain  some  more  weight,  so  we  also  need  to  consider  the 
schedules  including  r/f.  If  is  included  in  the  optimal  schedules,  Aj^  derived  by  Equations  18  and 
19  is  the  complete  set  of  scheduling  points  for  the  optimal  schedules  including  by  the  same 
reason  described  for  the  critical  task.  Hence,  it  is  sufficient  to  construct  the  complete  set  of  Ajt 
by  selecting  from  A)^  and  A^^^.  Since  whether  to  include  or  not  does  not  affect  the  feasibility 
of  the  schedules,  we  only  need  to  consider  the  weights  of  the  optimal  schedules.  A  complete  set 
of  scheduling  points  indicates  that  the  weights  of  the  optimal  schedules  at  these  scheduling  points 
should  be  non -decreasing.  Furthermore,  a  complete  and  minimum  set  of  scheduling  points  indicates 
that  the  weights  of  the  optimal  schedules  at  these  scheduling  points  should  be  strictly  increasing. 
Hence,  we  can  merge  and  sort  the  two  arrays  of  Aj^  and  Aj^-i,  and  remove  any  scheduling  point 
that  has  a  smaller  weight  than  that  of  its  preceding  scheduling  point  in  the  array.  The  resultant 
scheduling  points  is  thus  complete  and  minimum. 

3.7  The  Permutation  Scheduling  Algorithm  (PSA) 

Algorithm  PSA: 


Input:  a  permutation  sequence  /r  =  (ti  ,  T2,  . . . ,  t„) 
Output:  an  optimal  schedule 


Initialization:  uo  =  1;  Aq.i  =  0;  ao(Ao,i)  =  ();  W(ao(Ao.i))  =  0 
for  /:  =  1  to  n 


case  1  {dk  -  Ck  <  Lk-i )  :  (/x  is  not  feasible) 
exit 

case  ,2  (rjt  <  ijt-i  <  dk  -  Ck)  or  {Lk-i  <  fk  <  Uk-i)  : 

Computation  for  the  first  scheduling  point: 

A;t,i  =  mai(Ajt_i,i  +  +  Ck) 

j  =  1  if  Ajt_i,i  >  Tk;  otherwise,  j  is  the  greatest  integer  such  that  Ajt-ij  < 

c^Jt(Afc,i)  =  ai_i(Ai_ij)  ®  Tk 

W{ak{Xk,i))  =  W{ak.i{Xk-ij)) 

Loop:  j  =  J-min  to  Jmaxt  where  J-min  and  Jmax  denote  the  smallest  and  the  largest 

integers  of  j  satisfying  Xk,\  <  Xk-i,j  +  Ck  <  dk- 

f  —  j  dmin  "b  2 

Xk,i  =  Xk-l,j  +  Ck 

<^k{Xk,i)  =  ak-i{Xk-ij)  ©  Tk 

m^k{Xk,i))  =  W{ak.^{Xk.ij)) 

'^k  —  Jmax  ~  Jmin  "b  2 

case  3  {Uk-\  <  Tk)  :  (only  one  scheduling  point  ) 

Xk,l  =  Tk-h  Ck 

<Tk(Xk,l)  =  CTjt_i(A;t-l.v*_,)  ©  Tk 
W(‘^k(Xk,l))  =  m<^k.l(Xk-l,.,_,)) 

Vk  =  1 


when 


Tjc  is  critical 


Tk  is  non-criticcil 


when 

case  1  {dk  —  Ck  <  Lk-i)  :  (scheduling  points  and  schedules  remain  the  same) 

/*  Do  nothing;  Tk  cannot  be  included  in  any  fecisible  schedule  */ 

/*  Hence,  Xk  =  Ajt_i  and  Uk  =  crk-i  */ 

case  2  (r^  <  Lk-r  <  dk  -  Ck)  or  {Lk-i  <  Tk  <  Uk-\)  : 

Computation  for  the  first  new  possible  scheduling  point: 

=  T^ax(Ajt_i.i  +  Ck,Tk  +  Ck) 

j  =  1  if  Afc_i,i  >  Tk',  otherwise,  j  is  the  greatest  integer  such  that  Xk-ij  <  rjt 
*^ki^k,l)  ~  ^k—l{Xk-lj)  0  Tk 

Loop:  j  =  Jmin  to  Jmax?  where  Jmin  2Lnd  Jmax  denote  the  smallest  and  the  largest 

integers  of  j  satisfying  ^  <  A/^-i  j  +  Ck  <  dk^ 

i  ^  j  J "h  2 

=  ^k-lj  +  Ck 

^ki^'k,i)  =  <^/c-l(Ai_i,j)  0  Tk 
n<{K,i))  =  W{ak-yiXk.ij))  +  Wk 
construct  Ck  from  Ok-i  and  cr^  by 

1)  merging  and  sorting  Xk-i  and  X'j.  into  one  array 

2)  making  the  weights  of  the  schedules  in  the  resultant  array  strictly 
increasing;  removing  any  schedule  off  the  array  if  necessary. 

case  3  {Uk-i  <  Tk)  :  (adding  one  more  scheduling  point) 

Vk  =  Vk-l  +  1 
Ajt.vk  =Tk  +  Ck 

<Tk{Xk,v^)  =  ak-i{Xk~i,v^_,)  0  Tk 


=  Wiak-^  (A, ))  +  wk 

/*  Note  that  Xkj  —  ^k^ij  and  for  j  =  1  to  */ 

endfor 

4  Scheduling  a  Task  Set 

To  find  an  optimal  schedule  for  the  task  set,  we  may  have  to  consider  all  possible  (tz!)  permutations. 
It  is  possible  to  reduce  the  search  space  by  eliminating  some  infeasible  permutations.  For  example, 
if  d{  <  Tj,  there  is  no  feasible  schedule  in  which  r,*  is  placed  after  tj.  Even  after  the  reduction,  the 
search  space  might  still  be  too  large.  We  propose  to  use  simulaied  annealing  technique,  recognizing 
that  while  this  technique  reduces  the  search,  it  may  yield  sub-optimal  results. 

4.1  Simulated  Annealing 

Simulated  annealing  is  a  stochastic  approach  for  solving  large  optimization  problems.  It  wsls  de¬ 
veloped  using  statistical  mechanics  ideas  to  find  a  global  minimum  point  in  the  energy  space. 
Kirkpatrick  ei  al  [5]  had  demonstrated  the  power  and  applications  of  simulated  annealing  to  the 
field  of  combinatorial  optimization. 

To  find  the  optimal  solution  of  the  optimization  problem  is  similar  to  finding  the  lowest  energy 
state  of  metal.  The  metal  is  melted  first.  Then  it  is  cooled  down  slowly  until  the  freezing  point 
is  reached.  At  each  temperature,  a  number  of  trials  are  carried  out  to  reach  the  equilibrium.  The 
temperature  has  to  be  controlled  not  to  drop  too  quick;  otherwise,  it  is  possible  to  be  trapped 
in  a  local  minimum  energy  configuration.  Lower  energy  generally  indicates  a  better  solution. 
The  annealing  process  starts  from  a  randomly  chosen  configuration,  proceeding  to  seek  potentially 
promising  neighbor  configurations.  The  neighbor  configuration  is  derived  by  perturbing  the  current 
configuration.  If  the  neighbor  configuration  has  a  lower  energy,  the  change  is  always  accepted.  The 
distinct  feature  is  that  the  neighbor  configuration  with  a  higher  energy  can  also  be  accepted  with 
the  probability  of  ^  where  T  is  the  temperature,  and  E  —  E'  represents  the  difference  in  the 
energy  of  current  and  neighbor  configurations.  Notice  that  when  the  temperature  is  high,  an  energy 


up  jump  is  more  likely  than  it  is  when  the  temperature  is  low,  as  it  may  reach  the  configuration, 
although  with  higher  energy,  which  may  lead  to  a  better  solution.  An  up  jump  means  a  jump  from 
low  energy  to  high  energy,  and  a  down  jump  means  a  jump  from  high  energy  to  low  energy. 

4.2  The  Set  Scheduling  Algorithm  (SSA) 

A  permutation  is  used  to  represent  the  configuration.  If  a  permutation  is  ordered  in  an  Earliest 
Deadline  First  (EDF)  fashion,  we  call  it  an  EDF  permutation.  An  EDF  permutation  may  be  a 
good  starting  permutation  for  the  process  of  simulated  annealing  for  this  problem.  If  the  window 
of  a  task  is  contained  in  the  window  of  another  task,  we  say  that  the  latter  task  contains  the  former 
task.  If  there  are  no  containing  relations  among  tasks,  the  EDF  permutation  is  a  permutation  of 
which  an  optimal  schedule  of  the  task  set  is  a  subsequence  [4].  Thus,  an  optimal  schedule  for  the 
task  set  can  be  generated  by  PSA  by  scheduling  the  EDF  permutation.  The  energy  function  can 
be  expressed  by  a  loss  function: 

loss  =  ^  weight  of  rejected  noncritical  tasks 

A  schedule  is  not  acceptable  if  critical  tasks  are  rejected.  We  may  say  that  the  loss  of  a  rejected 
critical  task  is  infinity.  However,  this  kind  of  assignment  makes  it  difficult  to  distinguish  between 
a  very  bad  schedule  (e.g.,  a  critical  tcLsk  is  rejected)  and  even  a  worse  schedule  (more  critical  tasks 
are  rejected).  In  general,  the  former  schedule  can  be  considered  as  an  improvement  over  the  latter 
one.  K  the  loss  incurred  by  a  rejected  critical  task  is  assigned  infinity,  there  is  no  way  to  teU  which 
is  better  between  the  schedule  in  which  one  critical  task  is  rejected  and  that  in  which  three  critical 
tasks  are  rejected.  Hence,  we  assign  a  finite  amount  of  loss  to  rejected  critical  tasks.  The  loss 
of  a  critical  task  must  be  large  enough  such  that  the  scheduler  will  not  reject  a  critical  task  to 
accommodate  a  number  of  non-critical  tasks. 

The  neighbor  function  may  be  obtained  using  one  of  the  following  two  methods.  In  the  first, 
simple  method,  we  randomly  select  one  task  from  those  rejected.  This  task  is  inserted  in  a  randomly 
chosen  location  within  a  specified  distance  from  its  original  location,  where  the  distance  is  the 


number  of  tasks  between  two  tasks  in  a  permutation.  The  distance  is  used  in  this  approach  to 
control  the  degree  of  perturbation. 

The  reason  of  rejecting  a  task  is  due  to  the  acceptance  of  other  teisks.  Given  a  schedule  for 
a  permutation,  it  is  sometimes  difficult  to  identify  which  task  results  in  the  rejection  of  other 
tcLsks,  especially  when  tcisks  are  congested  together.  However,  the  task  immediately  before  or  after 
those  rejected  is  likely  to  play  a  role.  In  the  second  method,  we  try  to  identify  the  task  which 
causes  the  largest  loss  of  weight.  As  a  simple  approach,  we  attribute  the  rejection  of  a  task  to 
the  task  accepted  prior  to  it.  Then  we  choose  the  task  which  causes  the  largest  loss  of  weight  and 
insert  it  within  a  specified  distance.  Due  to  the  robustness  of  simulated  annealing  technique,  the 
impact  of  not  necessarily  selecting  the  task  which  caused  the  largest  loss  is  minimal.  Note  that  in 
simulated  annealing  many  parameters  are  randomized,  and  the  energy  function,  together  with  the 
temperature,  control  the  progress  of  the  annealing  process.  Tindell  ei  al  [9]  commented  that  the 
great  beauty  of  the  simulated  annealing  lies  in  that  you  only  need  to  describe  what  constitutes  a 
good  solution  without  worrying  about  how  to  reach  it.  According  to  our  experiments,  we  find  that 
the  first  method  performs  better  than  the  second  method.  However,  the  process  in  the  first  method 
sometimes  falls  into  a  local  minimum.  The  combination  of  the  two  methods  does  perform  better 
than  any  of  the  individual  one.  The  Set  Scheduling  Algorithm  (SSA)  is  presented  in  Figure  7. 

The  initial  temperature  has  to  be  large  enough  such  that  virtually  all  up  jumps  are  allowed  in 
the  beginning  of  the  annealing  process.  According  to  [9],  the  way  to  compute  new  temperature  is 
that  new  temperature  —  a  *  current  temperature,  where  0  <  a  <  1.  A  step  denotes  an  iteration 
in  the  inner  loop  in  Figure  7,  which  is  the  process  of  scheduling  a  permutation  and  determining 
whether  the  permutation  would  become  the  current  permutation.  The  thermal  equilibrium  can  be 
reached  if  a  certain  number  of  down  jumps  or  a  certain  number  of  total  steps  has  been  observed; 
and  the  freezing  point,  or  the  stopping  condition,  can  be  reached  if  no  further  down  jump  has  been 
observed  in  a  certain  number  of  steps  [5,  9]. 


Algorithm  SSA: 


Begin 

choose  initial  temperature  T 

choose  edf  permutation  as  the  starting  permutaion,  fi 
schedule  /z  by  PSA  and  compute  its  energy,  E 
loop 

loop 

compute  neighbor  permutation  p' 

schedule  fi'  by  PSA  and  compute  its  energy,  E' 

if  E'  <  E  then 

making  fi'  the  current  permutation:  y.  *-  y!  and  E  <— 
else 

if  e  r  >  random(0,l)  then 

making  the  current  permutation:  *-  fi'  and  E 
else 

/i  remains  as  the  current  permutation 
until  thermal  equilibrium  is  reached 
compute  new  temperature:  T  a  *T 
until  stopping  condition  is  reached 

End 


Figure  7:  Set  Scheduling  Algorithm 


5  Experiment  Result 

Experiments  are  conducted  to  study  the  performance  of  SSA  based  on: 


scheduling  ability  = 


number  of  times  that  the  aly^orithm  j^enerates  &  feasible  schedule 
number  of  times  that  there  does  exist  a  feasible  schedule  for  the  task  set 


♦  loss  ratio  =  schedule  generated  by  SSA  ~  loss  of  an  optimal  schedule 

total  weight  of  accepted  noncritical  tasks  of  an  optimal  schedule 


•  iterations  —  number  of  permutations  that  the  simulated  annealing  algorithm  goes  through  to 
obtain  the  sub-optimal  schedule 


We  start  with  an  EDF  permutation.  To  study  how  good  the  result  would  be  by  using  PSA  to 
schedule  the  EDF  permutation,  the  scheduling  ability  and  loss  ratio  for  the  EDF  permutation  are 
computed  as  well.  In  our  experiments,  a  task  set  consists  of  100  tasks.  The  number  of  permutations 
in  such  a  task  set  is  100!  9.33  *  10^^^.  To  study  how  good  the  output  of  SSA  is  compared  to  an 

optimal  schedule,  it  is  rather  impractical  to  go  through  such  a  great  number  of  permutations  for  a 
task  set  to  derive  the  optimal  schedule  and  its  minimum  loss  for  comparison.  Instead,  we  choose 
to  make  up  a  task  set  such  that  the  task  set  is  feasible  and  the  loss  of  its  optimal  schedule  is  0. 
Although  the  SSA  algorithm  is  primarily  designed  for  an  overloaded  system,  we  apply  SSA  to  such 
task  sets  for  measuring  the  performance.  The  parameters  are  shown  in  Figure  8. 


parameters 

value 

type 

window  length 

mean,_Wl  =  20.0 

truncated  normal  distribution 

computation  time 

mean.C  - 

truncated  normal  distribution 

load 

20%,  40%,  60%,  80% 

constants 

criticality  ratio 

25%,  50%,  75% 

constants 

weight 

low_W=l,  high_W=50 

discrete  uniform  distribution 

Figure  8:  Parameters  of  the  experiments 


The  mean  of  window  length,  meanAVl,  is  set  to  be  20  time  units.  The  load  is  the  ratio  of  total 
computation  time  to  the  largest  deadline,  D,  in  the  task  set.  Hence,  the  load  indicates  the  difficulty 


of  scheduling  the  task  set.  The  mean  of  computation  time,  meanX,  is  one  third  of  the  mean  of 
window  length,  which  allows  the  windows  among  tasks  to  overlap  to  some  extent.  How  much  the 
windows  overlap  partially  depends  on  the  load.  If  the  load  is  high,  the  windows  are  congested 
together,  and  thus  the  overlapping  is  high.  We  expect  some  containing  relations  between  tasks 
to  occur  and  thus  increase  the  difficulty  for  scheduling.  Note  that,  without  containing  relations, 
scheduling  the  task  set  would  be  straightforward.  The  standard  deviations  of  window  length  and 
computation  time  are  set  to  be  their  means,  respectively.  Criticality  ratio  indicates  the  percentage 
of  the  critical  tasks  in  the  task  set.  It  is  set  to  be  25%,  50%,  and  75%.  The  higher  the  criticality 
ratio,  the  more  difficult  it  is  to  generate  a  feasible  schedule  for  the  task  set.  On  the  other  hand, 
although  it  is  easier  to  come  up  with  a  feasible  schedule  wffien  the  criticality  ratio  is  low,  the  loss 
ratio  may  still  be  high.  It  may  be  necessary  to  go  through  many  permutations  before  an  acceptable 
loss  ratio  is  reached.  In  our  experiments,  the  acceptable  loss  ratio  is  set  to  be  0%,  which  means 
that  SSA  will  keep  trying  different  permutations  until  either  the  loss  ratio  is  0  or  the  stopping 
condition  is  reached,  in  which  SSA  fails  to  find  an  optima]  schedule.  Note  that  a  big  energy  (loss), 
1000,  is  incurred  for  a  rejected  critical  task.  Hence,  for  an  infeasible  schedule,  the  loss  ratio  may 
weD  be  more  than  100%.  The  weight  of  a  non-critical  task  is  an  integer  ranging  from  low_W=l  to 
hign-W=50,  determined  by  a  discrete  uniform  distribution  function.  For  each  individual  experiment 
with  different  parameters,  200  task  sets,  each  with  100  tasks,  are  generated  for  scheduling.  The 
way  of  creating  a  feasible  task  set  without  loss  is  described  in  appendix  A. 

From  Figure  9a,  The  scheduling  ability  of  SSA  is  98.5%  when  criticality  ratio  is  75%  and  load 
is  80%,  and  is  100%  for  other  lower  criticality  ratios  and  loads.  This  is  because  the  simulated 
annealing  algorithm  focuses  on  searching  suitable  neighbor  permutations  in  such  a  way  that  the 
rejected  critical  tasks,  if  any,  may  be  accepted.  Note  that  scheduling  only  the  EDF  permutation 
can  not  always  generate  a  feasible  schedule.  The  scheduling  ability  of  scheduling  EDF  permutation 
degrades  when  load  increases,  which  means  tasks  congest  more  together.  The  scheduling  ability 
of  scheduling  EDF  permutation  also  degrades  when  the  criticality  ratio  increases,  which  makes 
meeting  the  deadlines  of  all  critical  tasks  become  more  difficult. 


As  far  2LS  non-criticaJ  ta^ks  are  concerned,  SSA  can  not  guarantee  the  minimum  loss.  However, 
even  in  the  worst  case  given  in  Figure  9b,  the  loss  ratio  is  less  than  10%.  The  loss  ratio  becomes 
less  when  criticality  ratio  or  load  is  less.  In  many  cases,  the  loss  ratios  are  less  than  5%.  As  for 
scheduling  the  EDF  permutation,  the  loss  ratios  are  significantly  larger. 

The  number  of  permutations  to  be  searched  in  simulated  annealing  depends  on  the  situations 
of  energy  jumps,  the  way  of  reducing  temperature,  and  how  we  define  thermal  equilibrium  and 
stopping  conditions.  In  the  experiments,  we  find  that  reducing  temperature  faster  does  not  impose 
a  negative  impact  on  the  scheduling  ability  and  loss.  How  to  set  the  parameters  in  simulated 
annealing  differs  a  great  deal  from  one  application  to  another.  We  do  want  to  generate  the  result 
as  good  as  possible,  but  are  not  willing  to  spend  more  computation  time  than  necessary.  This 
usually  requires  fine  tuning  the  parameters  to  get  the  trade-off  between  the  two  goals.  We  find  that 
the  following  parameters  are  beneficial:  initial  temperature  =  3000,  a  =  0.8  (instead  of  0.95  or  even 
0.99  suggested  in  other  applications),  the  number  of  down  jumps  to  obtain  thermal  equilibrium  — 
25,  the  number  of  total  steps  to  obtain  thermal  equilibrium  =  300,  the  number  of  steps  with  no 
further  down  jump  to  obtain  the  freezing  point  =  2000,  which  is  also  the  stopping  condition.  The 
average  number  of  permutations  searched  in  simulates  annealing  is  given  in  Figure  9c.  If  SSA  can 
successfully  generate  a  feasible  schedule,  the  average  number  of  permutations  checked  is  no  more 
than  4000  times.  The  number  increases  a  little  if  SSA  fails  to  find  a  feasible  schedule,  because  in 
this  case  SSA  does  not  stop  until  the  freezing  point  is  reached.  Note  that  the  average  numbers  of 
permutations  are  less  than  which  can  roughly  give  us  the  idea  about  the  complexity  of  searching 
over  the  permutation  space.  Additional  studies  have  shown  that  if  we  modify  the  above  parameters 
to  increase  the  average  number  of  permutations  by  about  10  times,  the  loss  ratios  can  be  further 
reduced  by  about  25%  of  the  loss  ratios  obtained  here. 

If  time  can  be  expressed  in  integers,  the  dynamic  programming  technique  used  in  PSA  can  be 
applied  by  computing  cr;b(0  ^  =  1, . . Z?.  Let  us  call  this  approach  the  integral  PSA,  compared  to 
the  original  PSA  wdth  scheduling  points,  denoted  by  PSA  SP  in  Figures  9d.  Obviously,  the  integral 
PSA  tends  to  compute  more  schedules  than  the  original  PSA.  We  would  like  to  see  how  more 
efficient  the  original  PSA  algorithm  is  than  the  integral  PSA.  Specifically,  we  compare  the  average 


number  of  schedules  required  to  derive  the  optimal  schedule  for  a  permutation.  For  the  integral 
PSA,  the  number  of  schedules  computed  is  fixed,  or  as  can  be  seen  in  Figure  1.  For  the  original 
PSA,  is  the  number  of  schedules  needed  to  schedule  a  permutation.  The  average  number 

of  schedules  needed  to  schedule  a  permutation  by  PSA  is  computed  over  the  permutations  of  a  task 
set,  and  is  presented  in  Figure  9d.  The  number  for  the  original  PSA  decreases  with  the  criticality 
ratio.  This  is  because  a  critical  task  never  increases  the  number  of  scheduling  points;  instead,  the 
number  of  scheduling  points  might  be  decreased  due  to  the  timing  constraint  of  the  critical  task. 
For  the  criticality  ratios  of  0.25, 0.50,  and  0.75,  the  average  number  of  schedules  required  for  a  task 
set  of  100  tasks  are  approximately  480,250,  and  150,  respectively.  The  complexity  of  the  originaJ 
PSA  seems  linear  in  this  sense.  On  the  other  hand,  the  complexity  of  the  integral  PSA  is  quite 
high.  The  number  decreaises  with  load.  This  happens  to  be  related  to  the  way  of  generating  the 
task  set,  in  which  D  =  totaLc  /  load.  The  number  is  equal  to  n  ♦  £),  where  D  might  fluctuate  a 
little. 

6  Conclusion 

In  this  paper,  we  study  the  scheduling  problem  for  a  real-time  system  which  is  overloaded.  A 
significant  performance  degradation  may  be  observed  in  the  system  if  the  overload  problem  is  not 
addressed  properly  [2].  As  not  all  the  taisks  can  be  processed,  the  set  of  tasks  selected  for  processing 
is  crucial  for  the  proper  operation  of  an  overloaded  system.  We  assign  to  the  tasks  criticalities  and 
weights  on  the  basis  of  which  the  tasks  are  selected.  The  objective  is  to  generate  an  optimal 
schedule  for  the  task  set  such  that  all  of  the  critical  tasks  are  accepted,  and  then  the  loss  of  weights 
of  non-critical  tasks  is  minimum. 

We  present  a  two  step  process  for  generating  a  schedule.  First,  we  develop  a  schedule  for 
a  permutation  of  tasks  using  a  pseudo-polynomial  algorithm.  The  concept  of  scheduling  points 
is  proposed  for  the  algorithm.  In  order  to  find  the  optimal  schedule  for  the  task  set,  we  have  to 
consider  all  permutations.  The  simulated  annealing  technique  is  used  to  limit  the  search  space  w’hile 
obtaining  optimal  or  near  optimal  results.  Our  experimental  results  indicate  that  the  approach  is 


very  efficient. 

The  work  presented  in  this  paper  can  be  easily  extended  to  address  the  overload  issue  for 
periodic  tasks.  To  schedule  a  set  of  periodic  tasks  with  criticalities  and  weights,  we  can  convert 
the  periodic  tasks  in  the  time  frame  of  the  lesist  common  multiple  of  the  task  periods  to  aperiodic 
tasks.  The  schedule  generated  for  the  frame  can  be  applied  repeatedly  for  the  subsequent  time 
frames. 

Our  algorithm  can  also  be  applied  to  solving  the  problem  of  scheduling  imprecise  computations 
[7],  in  which  a  task  is  decomposed  logically  into  a  mandatory  subtask,  which  must  finish  before 
the  deadline,  and  an  optional  subtask,  which  may  not  finish.  The  goal  is  to  find  a  schedule  such 
that  the  mandatory  subtasks  can  all  be  finished  by  their  deadlines  and  the  sum  of  the  computation 
times  of  the  unfinished  optional  subtasks  is  minimum.  A  schedule  satisfies  the  0/1  constraint  li 
every  optional  subtask  is  either  completed  or  discarded  [7].  We  can  solve  this  problem  by  using 
our  algorithm  by  setting  the  mandatory  subtasks  to  be  critical,  and  the  optional  subtasks  to  be 
non-critical  with  weights  equal  to  their  computation  times. 

Appendix  A.  Generating  a  task  set 

Generate  computation  times  for  tasks  according  to  mean.C  and  the  standard  deviation 

D  =  (total  computation  time)  /  load 

Assigning  starting  instants,  s^,  to  tasks  such  that 

the  intervals  between  the  computation  times  are  truncated  normally  distributed 
For  each  task 

Determine  the  criticality  by  criticality jratio  and/or  weight  by  low_W  and  high.W 

Compute  the  window  length  of  r*  according  to  mean.Wl  and  the  standard  deviation 
(note  that  window  length  >  Ck) 

align  the  window  with  the  computation  time  in  their  middle  points: 

Tk  =  max(0,  ^ 

dk  =  min{D,  Tk  +  window  Jength) 


The  load  determines  how  the  tasks  would  be  congested.  Once  the  largest  deadline,  D,  has  been 
computed,  we  separate  the  computation  times  of  the  tasks  in  such  a  way  that  the  positions  of  the 
computation  times  on  the  time  axis  stretches  over  the  range  from  0  to  D.  Note  that  the  starting 
instants  of  the  computation  times  consist  in  an  optimal  schedule  for  the  task  set.  In  this  way,  aH  of 
the  tasks  in  the  task  set  can  be  accepted.  At  last,  the  windows  are  aligned  with  the  computation 
times. 
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Abstract 

This  paper  introduces  a  new  formulation  of  dynamic  systems  that  subsumes  both  the  classical  discrete  and  differentia] 
equation  models  as  well  as  current  trends  in  hybrid  models.  The  key  idea  is  to  express  the  system  dynamics  using 
symbols  to  which  the  notion  of  time  is  explicitly  attached.  The  state  of  the  system  is  described  using  symbols  which 
are  active  for  a  defined  period  of  time.  The  system  dynamics  is  then  represented  as  relations  between  the  symbolic 
representations. 

We  describe  the  notation  and  give  several  examples  of  its  use. 
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1  Introduction 


Traditionally,  systems  have  been  modelled  using  state  variables  defined  in  a  metric  space  and  the  system  dynamics 
defined  using  differential  equations.  This  approach  uses  continuous  descriptions  of  space  and  time.  When  we  use 
computers  for  expressing  and  manipulating  such  models  we  have  to  use  symbols  to  represent  it.  Symbols  are  discrete 
by  their  very  nature,  and  require  use  of  mapping  from  the  continuous  spaces  to  discrete  spaces.  These  mappings 
cause  problems  unless  carried  out  rather  carefully.  Further,  when  we  consider  the  problems  in  which  some  aspects 
of  the  system  are  genuinely  discrete,  hybrid  models  have  been  used.  As  different  techniques  have  to  be  used  for 
continuous  and  discrete  aspects  of  the  system,  significant  complexity  gets  added  to  such  models. 

Recognizing  that  the  computer  systems  only  use  symbols  for  any  representations,  in  this  paper  we  present  a  for¬ 
mulation  of  system  dynamics  directly  in  terms  of  symbols.  In  order  to  handle  the  synamics,  time  interval  over 
which  a  symbol  is  considered  valid  is  explicitly  attached.  The  symbols  describing  different  aspects  of  the  system 
may  be  from  a  set  appropriate  for  that  aspect.  The  dynamics  is  described  in  terms  of  rules  connecting  the  symbolic 
representations. 

This  paper  contains  the  preliminary  formulation  of  system  dynamics  in  the  framework  of  Symbol  Dynamics. 

2  Descriptions  of  System  Behavior 

For  the  purposes  of  this  paper,  tefrawor  includes  all  the  relationships  among  parts  of  a  system  at  the  same  or  different 
times.  In  particular,  the  combined  relationships  among  parts  of  a  system  at  the  same  time  is  usually  called  structure. 
Both  of  these  aspects  are  subsumed  in  our  use  of  the  term  behavior. 

We  assume  that  our  ability  to  generate  or  derive  new  information  about  the  system  behavior  changes  only  at  discrete 
points  in  time,  since  we  expect  to  perform  these  processes  on  digital  computers.  The  event  times  define  the  time 
scale.  In  this  paper,  we  introduce  Symbol  Dynamics^  a  totally  symbolic  way  to  represent  the  important  aspects  of 
dynamical  systems  and  processes,  so  that  we  can  reason  about  them  using  computers. 

3  Concepts  and  Notations 

This  section  contains  the  basic  notions  of  Symbol  Dynamics. 

3.1  State  Variable 

We  assume  that  systems  exist  and  change  over  time.  We  are  looking  for  a  method  of  describing  those  changes  so  we 
can  compute  how  to  control  them. 

The  systems  we  consider  can  be  described  with  state  variables.  Each  state  variable  is  an  observation  on  the  system 
or  a  derivation  from  other  state  variables. 

We  may  or  may  not  know  a  priori  which  state  variables  are  important,  or  even  which  ones  are  determinable  (i.e.,  the 
system  comes  first,  and  the  state  variables  are  chosen  to  be  helpful  in  describing  the  behavior).  We  might  call  the 
state  variables  attributes  of  the  state. 

3.2  Symbol 

We  want  to  measure  and  compute  with  information  about  a  system,  so  we  need  to  map  the  system  into  formal  spaces 
we  understand  better. 

A  type  is  a  symbol  set,  both  representing  a  set  of  values  and  including  some  operations  on  those  values;  this  is  the 
notion  of  formal  space  used  here.  It  includes  collections  of  mutually  dependent  types  and  functions  between  different 
types. 

A  symbol  of  a  given  type  is  an  element  of  the  set  of  values  that  type.  Any  notions  of  credibility,  confidence,  or 
uncertainty  are  part  of  the  type  system  that  is  used.  It  is  especially  important  to  define  the  allowable  operations  on 
these  kinds  of  types.  For  example,  for  measurements  of  a  system,  the  symbol  would  include  the  measured  value  and 
the  associated  uncertainty  value. 

3.3  Attribute  Identifier 

We  assume  that  we  will  want  to  know  different  things  about  the  system  behavior.  We  need  names  to  keep  track  of 
the  different  things  we  measure  or  compute. 


An  attribute  identifier  is  a  name  for  a  state  variable  (a  state  variable  is  like  a  probe  into  some  aspect  of  the  system 
behavior,  and  the  attribute  identifier  is  only  the  label). 

3.4  Expression 

I 

An  expression  is  a  pair 

(attribute  identifier:  symbol), 

which  is  interpreted  to  mean  the  assertion  that  the  state  variable  can  be  described  by  the  symbol  (when  the  expression 
is  active).  We  will  describe  the  precise  semantics  of  these  expressions  later  on. 

These  are  models  of  the  state  variable  values. 

3.5  Interval 

An  interval  is  a  pair 

[start  time,  end  time), 

^sumed  to  describe  a  half-open  interval  (to  save  us  j&rom  trouble  with  the  topology).  The  end  time  may  be  omitted, 
in  which  case  it  is  interpreted  to  mean  infinity  by  default. 

3.6  Characterizer 

A  characterizer  is  a  pair 
(expression,  interval), 
also  written 

(attribute  identifier:  symbol;  start  time,  end  time), 

interpreted  to  mean  that  the  expression  is  active  during  the  specified  interval.  It  becomes  active  at  the  start  time, 
and  becomes  inactive  at  the  end  time.  Each  characterizer  has  a  range  (its  interval  of  activity)  and  a  scope  (the  set 
of  attribute  identifiers  that  occur  in  its  expression). 

We  may  also  consider  a  symbol  set  that  includes  arithmetic  expressions  that  contain  an  explicit  time  variable  t.  For 
example, 

represents  a  continuous  change  along  the  inter\’al. 

We  will  also  have  occasion  to  reason  about  conditions  at  particular  points  in  time,  so  the  assertion  language  will  also 
have  characterizers  of  the  form 
(expression,  point). 

3.7  Event 

An  event  is  the  activation  or  deactivation  of  a  characterizer.  We  make  no  limiting  assumptions  about  simultaneous 
events. 


4  System  Description 

A  system  description  is  a  finite  set  of  characterizers,  so  we  assume  explicitly  .that  a  system  can  be  described  by  a 
finite  set  of  characterizers.  We  insist  that  only  a  finite  set  of  characterizers  be  active  at  any  one  time.  Since  esich  of 
those  characterizers  is  active  over  a  positive  interval,  there  is  therefore  some  small  interval  thereafter  during  which  ' 
all  of  them  are  still  active. 

Ever\'thing  we  know  about  a  system’s  behavior  is  described  by  characterizers  and  relationships  among  the  charac¬ 
terizers.  Domain  models  and  context  can  be  written  as  characterizers,  generally  with  large  intervals. 

4.1  Dynamics 

Relationships  among  characterizers  are  rules  that  define  the  dynamics.  These  rules  take  the  form: 

if  these  characterizers  (with  a  list)  are  active  on  these  intervals,  then  this  new  one  is  also  active  on  this 
other  interval  (not  necessarily  contained  in  the  intersection  of  the  original  intervals). 


Rules  can  contain  variable  identifiers,  with  implicit  universal  quantification. 

Relationships  hold  on  intervals  and  the  combination  may  extend  the  range.  We  generate  new  characterizers  according 
to  the  relationships,  either  predictive  (range  extension)  or  deductive  (knowledge  extension). 

The  language  in  which  the  rules  are  written  is  important,  since  it  has  to  accommodate  notations  from  many  different 
types,  many  of  which  will  not  be  known  when  the  language  is  defined.  Some  basic  concepts  that  will  be  in  any  of 
these  languages  are  continuity  and  derivatives. 

It  is  important  to  remember  that  the  system  comes  first,  and  that  the  state  variables  are  our  choices  for  modeling 
and  understanding  the  system.  This  means  in  particular  that  the  coordinate  systems  we  use  are  temporary,  and  that 
the  constraints  among  the  state  variables  are  expressed  explicitly  as  relationships. 


4.2  Normalization  and  Continuation 


Characterizers  may  have  overlapping  intervals.  Normalization  is  the  process  of  breaking  each  characterizer  into  two 
or  more  others,  to  fit  the  time  scale.  If  t  is  an  event  time,  and 
(a  :  r;  s,  e) 

is  a  characterizer  with  s  <i  <  e,  then  we  can  replace  it  with  two  characterizers 
(a  :  v\  s,t)  and  (a  :  v;  t,e). 

If  two  characterizers  use  the  same  attribute, 

(a  :  v:  s,  e) 
and 

(a  :  vj]i,u), 

then  we  say  that  the  second  one  continues  the  first  one  iff  they  are  adjaLcent  in  time,  so  f  =  e.  Continuity  considerations 
in  the  transition  from  i;  to  u;  at  time  t  are  treated  in  the  next  section. 

In  any  system  with  a  finite  density  of  event  times,  if  we  split  every  characterizer  that  spans  an  event  time,  then  we 
end  up  with  characterizers  that  start  and  stop  at  consecutive  event  times  (though  they  may  be  continued  by  other 
characterizers).  This  has  some  computational  conveniences. 

If  we  have  two  characterizers 
(a  : 

and 

(c  .  tL?,  fs)  > 

so  that  the  second  one  continues  the  first,  then  we  need  some  kind  of  explicit  characterizer  for  the  transition,  active 
in  an  interval  containing  the  transition  time.  If  there  is  a  description  u  in  an  appropriate  domain  for  which 
f  r,  for  ti  <  t  <  t2, 

I  w,  for  t2<t  <t^, 
then  we  can  conclude 

This  is  the  opposite  of  normalization. 

If  there  is  an  overlap,  that  is,  if  the  two  characterizers 
(a  :  v\ii,t2) 

and 

(g  :  w\tz,U) 

have 

\ti,h)  n  [ta,  t4)  non-empty, 


v(i)  =  vj{t)  for  t  €  {maa(ti,t3),min(t2,  t^)), 
then  we  can  also  conclude 

(q  :  u;  min(t: ,  t^),  max(t3,  ^4)). 


4.3  Continuation  and  Continuity 

One  aspect  of  continuity  is  transitions  from  one  symbol  to  another  across  interval  boundaries.  The  transition 
relations  are  extra  conditions  that  have  to  hold  at  the  transition  time  (usually  they  are  smoothness  conditions  for 
model  transitions). 

A  typical  smoothness  property  is  infinitesimal:  for  characterizers 
(a 


and 


(a  : 

we  normally  want  smoothness,  written 
d  V  d  w 

d  i  “  d  t 
and  continuity,  written 

Both  of  these  are  point  conditions  on  the  attributes  and  their  derivatives,  and  we  can  consider  only  conditions  on 
attributes  by  \ising  whatever  derivatives  are  needed  in  the  conditions:  instead  of 
(a:v;to,ti), 


we  use 

(a  :  (v,v');to,ti), 

and  write  our  smoothness  condition  as 


If  we  also  require  continuity  in  each  attribute,  so  that 

ZL;(t  =  =  w(t  =  tj), 

then  the  upper  limit  in  the  previous  expression  can  be  omitted. 

It  is  therefore  clear  that  we  must  deal  with  point  events  at  transitions 

but  not  with  point  characterizers.  If  we  make  the  transition  continuity  a  property  of  the  definition  of  continuation, 
then  we  can  assert  it  or  not  in  any  given  model. 

Of  course,  the  expression  t  =  t7  means  that  the  interval  —  e,  t^)  is  part  of  the  limit  computation  for  every  e  small 
enough,  so  we  might  be  able  to  iise  these  intervals  for  some  small  enough  e  without  having  to  take  the  limits. 

We  will  deal  with  these  considerations  in  the  simplest  way  possible.  We  have  a  characterizer  that  asserts  continuity  of 
the  relevant  attribute  across  a  larger  interval,  such  as  [to,  ^2)  above.  The  only  place  that  the  continuity  characterizer 
has  new  information  is  at  the  transition  point  ti,  but  we  simply  do  not  worry  about  the  redundancy. 


4.4  Characterizer  Semantics  and  Inference 

A  characterizer  is  what  we  want  to  assume  about  what  is  true  over  its  interval.  It  need  not  be  consistent  with 
the  other  characterizers  in  a  system  description;  we  explicitly  allow  false  assertions  here,  so  we  can  reason  using 
counterfactuals. 


4.4.1  Inference 

We  can  make  inferences  within  intervals,  according  to  some  rules.  If,  say,  there  is  a  rule 
S2&:S2  S3, 

and  two  characterizers 
(r  :  Sa;to,ti) 
and 

(v  :  S2;  t2y  ts) 

with  to  <  h  <  <  ts,  then  we  can  conclude 

(v  :  S3;t2,  ti). 

4.4.2  Prediction 

We  can  also  make  inferences  that  extend  intervals  in  some  cases.  They  take  the  form:  If 
(v  :  si;to,ti) 

and 

(n; :  S2;  to,  ti ) 

are  characterizers  with  to  <  ti,  then  there  is  a  characterizer 
for  some  t2,  h,  with  to  <  i2  <  i:  <  ts. 


4.4.3  Truth  Maintenance 


Because  we  do  not  presume  that  the  characterizers  in  a  system  are  truths,  we  need  to  be  much  more  careful  about 
when  they  can  be  used  together,  especially  in  the  inference  and  prediction  processes.  Since  the  inference  rules 
themselves  are  time  dependent,  we  need  to  keep  track  of  the  dependencies  of  every  characterizer,  both  how  and  when 
it  was  derived  (how  tells  us  about  hypotheses  and  inference  rules;  when  helps  us  in  checking  temporal  consistency) 
and  its  interval  of  activity. 

We  also  need  a  way  to  indicate  which  characterizers  we  DO  want  to  be  true,  so  that  different  collections  of  charac¬ 
terizers  can  be  compared  and  contrasted  within  the  same  context.  We  might  want  to  consider  computing  various 
maximal  consistent  sets  of  irredundant  assertions  as  an  aid  in  this  process. 

Various  rules  can  be  activated  that  lead  to  new  conclusions  in  an  interval,  which  can  supersede  old  ones;  we  also 
assume  partial  deduction,  not  total.  We  therefore  need  to  use  some  kind  of  non-monotonic  logic. 

4.5  Analysis 

Simulation  is  a  continuing  surprise. 

We  want  tools  with  analytic  power  to  help  reduce  our  reliance  on  simulation,  so  we  can  make  reliable  predictions 
about  the  system  behavior. 

All  of  our  computations  are  performed  from  the  symbols  active  at  a  given  time.  The  advantage  of  dealing  explicitly 
with  time  in  this  formulation  is  that  we  can  sit  outside  the  usual  sequencing  of  events,  taking  a  kind  of  “side^long^’ 
look  at  the  entire  time  line,  and  piece  together  parts  of  the  models  that  we  know  more  about  regardless  of  whether 
or  not  they  are  the  first  ones  in  our  time  interval  of  interest. 

We  can  also  perform  the  deductions  in  an  order  that  is  different  from  the  order  imposed  by  time,  using  any  of  a 
number  of  simple  mechanisms,  such  as  rule-based  systems  or  rewrite  logics;  both  are  being  investigated. 

5  Examples 

This  section  contains  several  examples  that  illustrate  the  utility  of  the  notation. 

5.1.  ODE 

A  simple  example  that  shows  range  extension  is  an  ordinary  differential  equation  (ODE).  For  ODEs,  the  solution 
method  is  part  of  changing  an  ODE  into  a  set  of  characterizers. 

So  let  us  consider  a  simple  second-order  ODE  for  the  sine  function, 
y”  =  —3/, 

^^'(0)  =  1, 

2/(0)  =  0, 

and  solve  it  with  Euler’s  method  (a  particularly  bad  one  for  this  kind  of  problem,  by  the  way). 

First,  vre  transform  the  equations  into  a  first  order  system  (in  the  usual  way)  by  taldng  x  =  V, 
z'  =  -y, 

y'  = 

x(0)  =  1, 

3/(0)  =  0, 

and  we  also  define  z  =  x'  =  3/". 

5.1.1  First-Order 

Now  the  way  Euler’s  method  works  is  by  linear  extrapolation,  so  for  a  given  time  t  =  to,  if  we  have 
^(to)  =  2:0, 

yik)  =  3/0, 

then  we  have 

20  =:  z{io)  =  -3/0, 
and  we  take 

2r(t)  =  2o-f2o*(t -to), 

2/(0  =  3/0 +  2:0  *  (t -to), 


for  t  in  some  small  interval 
=  to  +  dt). 

The  characterizers  that  describe  this  situation  are: 

(x  :  xo  +  ^0  *  (t  -  to);  to,  to  +  di), 

(y  :  yo  -f  2:0  (t  -  to);  to,  to  4-  dt), 

which  we  want  to  be  true  for  all  choices  of  2:o,yo)to,  dt  (which  ones  we  actually  use  in  our  system  description 
depend  on  how  we  choose  the  time  intervals  in  the  solution). 

The  characterizers  that  describe  the  initial  conditions  are  difficult,  because  they  cannot  be  described  with  half-open 
intervals  of  the  shape  we  have  thus  fax  described: 

(x:l;0), 

(y:0;0), 

which  is  always  going  to  be  a  problem  in  systems  that  start  at  a  certain  time. 

In  a  more  sophisticated  system,  the  choice  of  next  time  interval  would  depend  on  the  computed  accuracy  of  the 
current  solution. 

For  this  example,  we  simply  make  all  the  time  intervals  the  same,  and  say  that  the  characterizer  pair 
(x  :  xi  -f  *  (t  -  ti);  fi,  +  dt), 

(?/  ^2/1  4xi  *  {t  -  ^  dt) 

propagates  the  pair 

(x  :  xo  4  zo  *  (f  “  to);  io,  to  4  di), 

(y  :  yo  4  zo  *  (t  -  to);  to,  to  4  dt) 
iff 

2:3  =  xo  4  ^  *  dt, 

Vi  =  yo  4  2:0  *  dt, 
t]  =  to  4  dt, 

which  are  the  conditions  for  the  first  pair  to  meet  the  second  (the  condition  zi  —  — yi  is  part  of  the  definition  of 
these  characterizer  pairs). 

Extending  the  iteration,  we  have 
x(0)  =  1, 

y(0)  =  0, 

x(^  +  l)  =  x{k)  —  y{k)  ^  dt, 
y(A:4l)  =  y(^)  4  x(/:)  dt, 

which  can  be  witten  as  a  vector  equation  (we  put  the  matrix  on  the  right  so  we  can  use  row  vectors) 

(x,j/)(0)  =  (1,0), 

{x,y){k^l)  =  (2,y)(/:)  ^  , 

so  if  we  write  I  for  the  identity  matrix  and  J  for  the  matrix 


then  we  have  (with  X  =  (x,y)) 

^(0)  =  (1,0), 

J^(;c4l)  =  ;s:(;:)(74  J^dt), 
so 

;^r(/:)  =  (i,0) 

which  can  be  computed  exactly. 

Since  the  eigenvalues  of  (/  -f  J  »  dt)  are  1  db  i »  dt,  which  have  magnitude  1  -{-  dt^,  the  successive  powers  of  the  matrix 
diverge  for  any  dt  >  0,  and  therefore  so  does  the  iteration, 

5.1.2  Second-Order  Example 

In  this  section,  we  use  the  same  differential  equation  problem,  with  a  different  solver,  a  second-order  one  that  is 
almost  able  to  converge  properly.  We  therefore  have 

x'  -=  ~y, 


y'  =  I, 
x(0)  =  1, 

y(0)  =  0, 

as  above.  Our  initial  conditions  are 
(2:1;0), 

(y:0;0), 

as  before. 

The  method  we  use  is  a  simplified  secxjnd-order  Runge-Kutta  method  [?],  [?],  which  basically  amounts  to  averaging 
the  iisual  Euler  approximation  in  an  interval  with  a  linear  reapproximation  at  the  endpoint  of  the  interval.  At  a 
given  time  t  =  to,  if  we  have 
x(£o)  =  xo, 

y(^o)  =  2/0, 
then  we  have 

x(t)  =  xo  —  yo  *  dt  —  xo  *  dt^/2, 

2/(0  =  2/0  +  xo  *  d£  -  2/0  * 

and  it  is  the  extra  dt^  terms  that  make  the  method  second-order. 

As  above,  we  assume  equal  time  intervals  and  get  an  iteration 

x(0)  =  1, 

y(0)  =  0, 

=  x{k)  ~  y{k)  *  dt  —  x{k)  *  dt^ /2, 

2/(fc-fl)  =  y{k) x{k)  *  dt  -  y{k)  dt^ /2, 
which  can  be  written  as  a  vector  equation 

(^,y)(0)  ==  (1,0), 

ixMk+i)  =  (x,,)w 

and  we  have  as  above 

^(0)  =  (1.0), 

XCk  +  l)  =  X{k){I*{l-dt^/2)  +  J*dt), 

so 

X(k)  =  (1, 0)  (/  *  (1  -  di V2)  +  J  * 
which  can  be  computed  exactly. 

Since  the  eigenvalues  of  (/  *  (1  —  dt^  f2)  +  J  *  d£)  are  1  ~  dt'^/2  dt,  which  have  magnitude  1  -r  di^fA^  this  simple 
method  still  does  not  converge  (but  much  more  slowly). 


5.1.3  Higher-Order  Example 

A  similar  analysis  of  the  usual  4th-order  RungesKutta  method  leads  to  an  iteration 
x(£)  =  xo-yo*dt-xo^  dt^/2  -f  yo  *  dt^/S  +  xq  *  d£^/24, 

y(0  =  yo  -f  xo  *  d£  -  yo  *  di^/2  -  xo  »  dt^ /6  H-  yo  *  dt^  124, 

vith  matrix 

f  1  -  dt^/2  -f  V24  dt  -  dt^ /6  \ 

V  -di  -h  dt^/e  1  ~  dt^/2  -f  dty24  J  ’ 

and  eigenvalue  magnitude  of  1  -}-  dt^/36  -f  dt^  f24^ ,  which  is  still  greater  than  one.  In  fact,  since  this  equation  (in 
(x,  y)  space)  represents  moving  around  a  circle,  any  extrapolation  method  based  on  tangents  at  a  single  point  vdll 
fail,  since  all  of  the  tangent  vectors  point  outward  from  the  circle.  We  note  that  the  iteration  equations  do  have  the 
first  terms  of  the  usual  Maclaurin  series  for  sin(d£)  and  cos(d£),  so  we  try  out  a  different  iteration: 
x(t)  =  Xo*  cos{dt)  -  yo  *  sin{dt), 

y(0  =  yo  ♦  cos{dt)  -h  xo  *  sin{dt), 

which  can  be  witten  as  a  vector  equation 

(2^iy)(o)  =  (1,0), 


{x,y){k+\)  =  (^.y)W  (  2,5(2)). 

and  we  have  as  above 

X{0)  =  (1.0). 

X{k  +  l)  =  X{k)il  *cos{dt)  +  J  *sinidt]), 
so 

X{k)  =  (1, 0)  (7  »  cos(di)  +  J  *  sin(dt))^, 

—  (1, 0)  (/  ★  cos{k  ^  dt)  J  *  sin{k  ★  dt)), 

and 

x{k*dt)  =  cos{k*dL), 
y{k  *  dt)  =  sin{k  *  di), 

from  which  we  can  hazard  a  guess  as  to  the  correct  solution. 


5.2  Measurement 

Let  us  take  a  simple  system  in  which  the  velocity  and  position  are  occasionally  known  through  inexact  measurement. 
Our  state  variables  are  p  for  the  position,  v  for  the  velocity,  and  a  for  the  unknown  acceleration. 

We  assume  that  the  acceleration  a  is  bounded  by  some  constant  A,  so  that  for  any  times  to  < 
b(fi)  <  |ti-tohA. 

We  assume  that  we  have  characterizers 

(a(t)*  tj— 1,  ti) 

that  describe  the  acceleration,  and  model  characterizers 
(u  =p';0".-). 

(a  =  v';0".-). 

Therefore,  we  can  compute  the  velocity  and  position  by 
u(t)  =  t;(to)  -f  /  a{u)  du, 

Jlo<u<t 

p{t)  =  piio)  +  /  v{u)  du. 

The  problem  is  to  choose  measurement  times  and  variables  that  maintain  a  certain  accuracy  in  the  estimates  of 
position. 

We  assume  that  we  can  measure  position  within  a  bound 
bmeas(0  "“^^(01  < 

and  that  we  can  measure  velocity  within  a  bound 

l^eas(0  “  “^(01  < 

but  that  we  w^ant  to  keep  our  estimate  of  position  either  more  accurately  than  the  position  measurement  error 
(this  might  or  might  not  be  possible)  or  using  as  few  measurements  as  possible. 

We  assume  first  that  xq^vq  are  known,  and  consider  an  interval  We  compute 

and  therefore 

|2:(ii)-xol  <  ^*|ti 
so  we  would  have  to  choose 
ht-ty-to 
so  that 

A  t  <  \V/A\ 

to  keep  the  velocity  within  bounds,  and 
'{A  if  <  \2*PjA\ 
to  keep  the  position  within  bounds. 

But  of  course,  we  don't  know  x(t)  or  i;(t)  after  the  first  time  interval,  so  we  need  to  change  the  previous  derivation 
a  bit. 

We  assume  that  we  know  x^  and  and  that 
ii(to)-a:oi  <  A  20 


describes  the  accuracy  of  our  knowledge  of  x(f)  at  time  f  =  fo,  and 
|v(to)  -  vo|  <  A  VQ 

describes  the  accuracy  of  our  knowledge  of  T;(t)  at  time  t  =  to.  Then  the  above  inequalities  become 
\v{ti)  —  vo\  <  A  uo  +  |ti  —  tol  *  A, 
and  therefore 

|z(ti)  -  xo|  <  A  Xo  4-  \ti  -  tol  *  ^  ^  ^  |ti  -  tol^  A, 

so  we  have  to  have 

At  <  \{V-Avo)/A\ 
to  keep  the  velocity  within  bounds,  and 

(A  t)  <  |2*(P-Axo)Ml 

to  keep  the  position  within  bounds. 

At  this  point,  we  are  stuck  unless  we  can  say  something  more  helpful  about  the  acceleration.  Suppose  we  know  that 
the  acceleration  jumps  around,  and  that  it  has  a  distribution  of  values  with  mean  0  and  variance  R.  In  this  case,  we 
might  be  able  to  reduce  the  estimates  for  position  and  velocity  and  improve  the  time  intervals. 

References 

[1]  P.  Henrici,  Elements  of  Numerical  Analysis,  Wiley  (1964) 

[2]  J.  Stoer,  R.  Bulirsch,  Einfurhrung  in  die  Numerische  Mathematik,  II  Springer  (1973) 


REPORT  DOCUMENTATION  PAGE 

form  Approv^c  -  1 

0MB  NO  070A-0ieB  1 

'foo*****:  ♦o''  r**«i*o*^  ;*  •*:  — *s  dt'  o*i» 

*  ».»o  «•“:  »».c  .*?*  .»o«ro.nQ  x*%,y  bw'orA  o*  *«>y  ot>»rf  ^voen  ©t 

pc'*  '■.•uc  *’*  *;•  **.:»/•  “>c r  •tr  ^o*  •'rooav.  UlS  >rrir< 

c- .''3  Kucnr*  •4or»»-o«»  fO  ^88)  OC  JCV03 

1.  AGENCY  USE  ONLY  tMJnK!  2.  REPORT  OATl  3.  REPORT  TYPE  AND  DATES  COVERED 

February  13,  1995  Technical  Report 

4.  TITLE  AND  SUBTITLE 

Notes  on  Symbol  D^mamics 

S.  FUNDING  NUMBERS 

N00014-91-C-0195  anc 
DSAG-60-92-C-0055 

6.  AUTHOR(S) 

Ashok  K.  Agrawala  and  Christopher  Landauer 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADORESS(ES) 

University  of  Maryland 

A.V.  Williams  Building 

College  Park,  Maryland  20742 

8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 

CS-TR-  3411 
UMIACS-TR-  95-15 

9.  SPONSORING /MONITORING  AGENCY  NAM£(S)  AND  ADDRESS(ES) 

Honeywell  Phillips  Labs 

3660  Technology  Drive  3550  Aberdeen  Ave.  SE 

Minneapolis,  MN  55418  Kirtland  AFB,  NM 

87117-5776 

10.  SPONSORING  /MONITORING 
AGENCY  REPORT  NUMBER 

TT.  SUPPLEMENTARY  NOTES 

12a.  DISTRIBUTION /AVAILABILITY  STATEMENT 

12b.  DISTRIBUTION  CODE 

12.  ABSTRACT  (Maxtmurrs  200  ^orci) 

This  paper  introduces  a  new  forraulation  of  dynamic  systems  that 
subsumes  both  the  classical  discrete  and  differential  equation 
models  as  well  as  current  trends  in  hybrid  models.  The  key  idea 
is  to  express  the  system  dynamics  using  symbols  to  which  the  notion 
of  time  is  explicitly  attached.  The  state  of  the  system  is 
described  using  symbols  which  are  active  for  a  defined  period 
of  time.  The  system  dynamics  is  then  represented  as  relations 
between  the  symbolic  representations.  Ve  describe  the  notation 
and  give  several  examples  of  its  use. 


•4.  SUBJECT  TERMS 

C.m,  Miscellaneous 


15.  NUMBER  OF  PAGES 
11  pages 


16.  PRICE  CODE 


17.  SECURiiY  CLA S *v  A  1 50 
i  OF  REPORT 

1  Unclassified 


IE.  SECI.IR5TY  CLASSIFICATION' 
Or  THIS  PAGE 

Unclassified 


security  Classification 
OF  abstract 

Unclassified 


JO.  limitation  of  abstf 

I 

Unlimited 


>.*SN  7S40-0‘>-:S0*5S00 


S:.*ir3a’’d  -orm  J9S  -Rev 


Implementation  of  the  MPL  Compiler*^ 

Jan  M.  Rizzuto  and  James  da  Silva 


Institute  for  Advanced  Computer  Studies 
Department  of  Computer  Science 
University  of  Maryland 
College  Park,  MD  20742 

February  14,  1995 


Abstract 

The  Maruti  Real-Time  Operating  System  was  developed  for  applications  that  must 
meet  hard  real-time  constraints.  In  order  to  schedule  real-time  applications,  the  timing 
and  resource  requirements  for  the  application  must  be  determined.  The  development 
environment  provided  for  Maruti  applications  consists  of  several  stages  that  use  various 
tools  to  assist  the  programmer  in  creating  an  application.  By  analyzing  the  source  code 
provided  by  the  programmer,  these  tools  can  extract  and  analyze  the  needed  timing  and 
resource  requirements.  The  initial  stage  in  development  is  the  compilation  of  the  source 
code  for  an  application  written  in  the  Maruti  Programming  Language  (MPL).  MPL  is 
based  on  the  C  programming  language.  The  MPL  Compiler  was  developed  to  pro\dde 
support  for  requirement  specification.  This  report  introduces  MPL  and  describes  the 
implementation  of  the  MPL  Compiler. 
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1  Introduction 


A  real-time  S3'stem  requires  that  an  application  meet  the  timing  constrsiints  specified  for  it. 
For  hard  real-time,  a  failure  to  meet  the  specified  timing  constraints  may  result  in  a  fatal 
error  [2].  Timing  constraints  are  not  as  critical  for  soft  real-time.  The  Maruti  Operating 
Sj'stem  was  developed  to  meet  the  real-time  constraints  required  by  many  applications.  In 
order  to  schedule  and  run  an  application  under  Maruti,  the  timing  and  resource  requirements 
for  that  application  must  be  determined.  The  development  environment  for  Maruti  consists 
of  several  tools  that  can  be  used  to  extract  and  analyze  these  requirements  [2]. 

The  Maruti  Programming  Language  (MPL)  is  a  language  developed  to  assist  users  in 
creating  applications  that  can  be  run  under  Maruti.  MPL  is  based  on  the  C  programming 
language,  and  assumes  the  programmer  is  familiar  with  C.  MPL  provides  some  additional 
constructs  that  are  not  part  of  standard  C  to  allow  for  resource  and  timing  specification  [1]. 
In  addition,  when  an  MPL  file  is  compiled,  some  of  the  resource  requirements  can  be 
recognized  and  recorded  to  an  output  file.  This  output  file  is  used  as  input  to  the  integration 
stage,  which  is  the  next  stage  in  the  development  cycle.  During  integration,  additional 
timing  requirements  may  be  specified. 

Previously,  an  MPL  file  was  compiled  by  first  running  the  source  code  through  the 
Maruti  pre-compiler,  which  created  a  C  file  that  was  then  compiled  using  a  C  compiler  [1). 
The  pre-compiler  extracted  the  necessary  information,  and  converted  the  MPL  constructs 
that  were  not  valid  C  statements  into  C  code.  This  required  the  additional  pass  of  the 
pre-compiler  over  the  source  code.  We  have  created  a  compiler  for  MPL  that  integrates 
both  the  actions  of  the  pre-compiler  and  the  compiler  into  one  stage.  In  this  report,  we 
present  MPL,  and  a  description  of  the  compiler  we  implemented.  Section  2  defines  the 
abstractions  used  in  Maruti.  In  Section  3,  the  syntax  of  the  constructs  unique  to  MPL  is 
defined.  The  details  of  the  implementation  of  the  compiler  are  given  in  Section  4.  Section  5 
describes  the  resource  information  that  is  recorded  during  compilation.  Conclusions  appear 
in  Section  6,  followed  by  an  Appendix  containing  a  sample  MPL  file,  and  the  resource 
information  recorded  for  that  file. 

2  Maruti  Abstractions 

An  MPL  apphcation  is  broken  up  into  units  of  computation  called  elemental  units  (EUs). 
Execution  within  an  EU  is  sequential,  and  resource  and  timing  requirements  are  specified 
for  each  EU.  A  thread  is  a  sequential  unit  of  execution  that  may  consist  of  multiple  EUs. 
MPL  allows  threads  of  execution  to  be  specified  bj'  the  programmer  through  several  of  the 
constructs  provided.  A  task  consists  of  a  single  address  space,  and  threads  that  execute  in 
that  address  space. .  Modules  contain  the  source  code  of  the  application  as  defined  by  the 
programmer.  An  application  may  consist  of  several  modules.  During  execution,  modules 
are  mapped  to  one  or  more  tasks. 

3  MPL  Constructs 

There  are  several  constructs  defined  in  MPL  that  are  not  a  part  of  standard  C.  These 
constructs  have  been  implemented  in  the  MPL  compiler. 


3.1  Module  Name  Specification 

A  module  may  consist  of  one  or  more  source  files  written  in  MPL.  At  the  start  of  each  MPL 
file,  the  name  of  the  module  that  the  source  file  corresponds  to  must  be  indicated.  This  is 
given  by  the  following  syntajc:  . 

module-naune-spec  'module'  <module“iiaffie> 

The  Bodule-aaae  may  be  any  valid  identifier  that  is  accepted  by  standard  C.  The  module 
name  specification  must  appear  at  the  beginning  of  the  source  file,  before  any  other  MPL 
code.  The  specification  is  not  compUed  into  any  executable  code.  It  is  simply  used  to 
indicate  the  module  that  the  functions  within  the  file  belong  to. 

3.2  Shared  Buffers 

A  shared  buffer^  can  be  used  to  declare  memory  that  may  be  shared  by  several  tasks,  to 
permt  communication  between  the  tasks.  A  declaration  of  a  shared  buffer  requires  the  type 
be  defined  as  with  a  variable  declaration;  The  syntax  of  a  shared  declaration  is: 

shared-buff  er-decl  :  'shared'  <type-specifier>  <shared-buff  er-iiaiiie> . 

The  shared-buff  er-aaae  can  be  any  valid  identifier,  and  the  rype-specif  ier  can  be  any 
valid  type  for  a  variable.  A  shared  declaration  is  compiled  as  a  pointer  to  the  type  given  in 
the  declaration  of  the  shared  buffer,  rather  than  the  type  given. 

3.3  Region  Constructs 

The  are  two  constructs  used  to  allow  for  mutual  exclusion  within  an  application. 

3.3.1  Region  Statement 

The  repion  statement  is  used  to  enforce  mutual  exclusion  globally  throughout  an  entire 
application,  and  is  given  by  the  syntax: 

region-sratement  'region'  <region-iiaine> 

{  mpl-s'Canements 

The  Bpl-statemeats  may  be  any  number  of  valid  MPL  statements.  These  statements 
make  up  a  critical  section. 


3.3.2  Local  Region  Statement 

The  locaLregion  statement  is  used  to  enforce  mutual  exclusion  within  a  task,  and  follows 
the  same  syntax  of  the  region  statement: 

local-region^statement  'local^region '  <local-regioii-2iaiDe> 

f  mpl-suatejnents  }. 


3.4  Channel  Declarations 


Channels  are  used  to  allow  for  message  passing  within  a  Maruti  application.  Each  channel 
declared  has  a  type  associated  with  it  given  by  a  valid  C  type-specifier.  This  type  indicates 
the  type  of  data  that  the  channel  will  carry. 

Channels  may  be  declared  in  both  entry  and  service  functions,  which  will  be  defined 
below.  The  sjmtax  for  channel  declarations  is: 

chaiuiel-declaration-list-opt  : :=  {  channel-declaration-list  >. 

chajmel-declaration-list  ::=  chajmel-declaration  {  channel-declcoratioii  I. 

channel-declaration  ::=  channel-type  channels 

channel-type  : :=  ’out'  |  'in’  |  'in-first'  |  'in-last'. 

channels  ::=  channel  {  channel  }. 

chcinnel  <channel-naffle>  type-specifier. 

3.5  Entry  Functions 

An  entry  function  is  a  special  type  of  function  that  may  be  defined  in  an  MPL  source  file. 
Each  entry  function  corresponds  to  a  thread  within  the  application.  The  syntax  for  an  entry 
function  definition  is: 

entry-function  ::=  'entry'  <entry-ncane>  ’(’  ’)’  entry-function-body, 
entry-function-body  ::=  channel-declaration-list-opt  mpl-f unction-body . 

3.6  Service  Functions 

Service  functions  are  another  type  of  special  function  supported  by  MPL.  .4  service  function 
is  invoked  when  a  message  is  received  from  a  client.  Each  service  function  definition  reqmres 
an  in  channel  and  message  buffer  be  included  in  the  definition.  The  service  function  will 
be  executed  when  there  is  a  message  on  the  channel  given  in  the  definition.  The  definition 
of  a  service  fimction  is  similar  to  that  of  an  entry  function: 

service-function  ::=  'service'  <service-naine> 

’(’  <in-cheinnel-naine>  type_specifier  <msg-ptr-naffle>  ’)’ 
service-function-body . 

service-function-body  ::=  channel-declaration-list-opt  mpl-function- body . 

3.7  Communication  Function  Calls 

There  are  several  library  functions  used  to  allow  for  message  passing  within  a  Maruti  ap¬ 
plication. 

3.7.1  Send  CaJls 

Each  call  to  the  send  function  must  specify  an  outgoing  channel  for  the  message: 
void  send  (  channel  channel_name ,  void  ’►inessage_ptr  ); 


3.7.2  Receive  and  Optreceive  Calls 

Both  receive  calls,  and  optreceive  calls  must  be  associated  with  an  incoming  channel  fin 
in.firsi,  or  injasi):  ^  ’ 

void  receive  C  clicdiiel  cli2LDjiel_2i3Line ,  void  ’♦‘message  ptr  )j 
int  optreceive  (  chaimel  channel^name ,  void  ^message_ptr  ); 

A  call  to  receive  requires  that  there  be  a  message  on  the  incoming  channel.  Optreceive 
should  be  used  when  a  message  may  or  may  not  be  on  the  channel.  Optreceive  checks  for 
the  message,  and  returns  a  value  indicating  if  a  message  was  found. 

3.8  Initialization  Function 

Each  task  has  an  initialization  routine  that  is  executed  when  the  application  is  loaded.  This 
function  is  specified  by  the  user  with  the  foDowing  name  and  arguments: 

int  maniti^main  (int  airgc,  char  ’►^argv) 


4  Implementation 

\^e  started  with  version  2.5.8  of  the  Gnu  C  compiler.  By  modifying  the  source  code  for 
the  C  compiler,  we  have  created  a  compiler  for  applications  written  in  MPL.  In  addition 
to  what  the  standard  Gnu  C  compiler  does,  this  modified  compiler  handles  the  additional 
constructs  defined  in  MPL,  and  records  information  about  the  source  code  that  is  needed 
by  Maxuti.  A  source  code  file  written  in  MPL  is  specified  with  an  mpl  extension. 

4.1  Modifications  to  GCC  File  Structure 

In  the  process  of  modifying  the  compiler,  some  existing  files  were  modified.  In  addition, 
some  new  files  were  also  created.  The  source  code  for  version  2.5.8  of  GCC  allows  compilers 
to  be  created  for  several  different  languages:  C,  C-r+,  and  Objective  C.  The  GCC  compiler 
uses  different  executable  nles  for  the  different  languages  that  it  compiles.  There  are  separate 
files  for  C,  C-i  r ,  and  Objective  C  (ccl,  cclplus,  cclobj).  The  GCC  driver,  gcc.c,  uses  the 
extension  of  the  source  file  specified  to  determine  the  appropriate  executable  (and  therefore 
language)  to  compile  the  source  file.  The  driver  then  executes  the  compiler,  passing  on  the 
appropriate  switches.  The  driver  was  modified  to  accept  input  files  with  an  mpl  extension. 
Cclmpl  is  the  new  executable  that  was  created  to  compile  MPL  source  files.  When  a  file 
with  an  mpl  extension  is  specified  as  a  source  file  to  be  compiled,  this  new  executable  file  is 
used.  When  an  MPL  file  is  compiled,  it  automatically  passes  on  the  switch  -Maruti.output, 
which  indicates  that  the  needed  output  should  be  recorded  to  a  file  with  an  eu  extension. 

The  executable  files  for  each  language  are  composed  of  many  object  files.  Some  of  these 
files  are  common  to  all  the  languages,  and  some  of  the  files  are  language-specific.  The 
language-specific  files  added  for  compiling  MPL  files  are  those  files  with  an  mpl-  prefix. 

GperJ  is  a  tool  used  to  generate  a  perfect  hash  function  for  a  set  of  words.  Gperf  is  used 
to  create  a  hash  function  for  the  reserved  words  for  each  language.  The  files  containing 
the  input  to  gperf  are  indicated  by  a  file  name  with  a  gperf  extension.  There  are  several 
different  *.gperf  files  containing  the  reserved  words  for  the  different  languages  recognized  by 


the  compiler.  The  mpl-parse.gperf  file  contains  all  the  reserved  words  for  C,  in  addition  to 
those  added  for  MPL.  For  each  language,  the  output  from  running  gperf  is  then  incorporated 
into  the  *-lex.c  file.  This  output  includes  a  function  is.reserved^word()  that  is  used  to  check 
if  a  token  is  a  reserved  word.  The  file  mpl-lex.c  is  basically  the  c-lex.c  file,  with  the  output 
of  running  gperf  on  mpl-parse.gperf  instead  of  c-parse. gperf. 

The  file  maruii.c  contains  the  routines  that  have  been  written  to  implement  MPL.  This 
file  is  linked  in  with  the  executable  for  all  of  the  languages,  to  prevent  undefined  symbol 
errors  from  occuring.  Calls  to  the  routines  contained  in  this  file  occur  in  both  the  language- 
specific,  and  the  common  files.  The  flag  marutLdump  is  set  in  mainQ  to  indicate  whether 
information  about  the  source  code  should  be  recorded  to  the  appropriate  output  file.  This 
flag  prevents  calls  to  the  routines  in  maruti.c  which  are  made  in  the  common  files  from 
occuring  for  the  languages  other  than  MPL.  The  files  containing  these  calls  are: 

•  calls,  c 

•  explow. c 

•  expr.  c 

•  function,  c 

•  toplev.c 

There  are  several  reasons  w'hy  the  new  language-specific  files  have  to  be  created  for 
MPL.  The  files  mpl-lex.h  and  mpl-lex.c  needed  to  be  created  for  MPL  because  MPL  contains 
several  additional  reserved  words  not  present  in  C,  as  mentioned  earlier.  The  file  c-common.c 
relies  on  information  in  the  header  file  c-lex.h.  Since  MPL  uses  mpl-lex.h,  mpl-common.c 
includes  mpl-lex.h,  instead  of  c-lex.h.  Bison  is  a  tool  that  allows  a  programmer  to  define 
a  grammar  through  rules,  and  converts  them  into  a  C  program  that  will  parse  an  input 
file.  The  *-parse.y  files  are  the  bison  files  used  to  create  the  grammar  to  parse  a  source 
file.  Since  the  grammar  for  MPL  needed  to  be  modified  to  accept  the  additional  constructs, 
the  mpl-parse.y  file  was  created.  There  is  one  function  used  in  compiling  MPL  source  files 
that  is  defined  in  mpl-parse.y,  instead  of  marutLc.  This  function  needed  to  access  the  static 
variables  declared  in  mpl-parse.y,  and  in  order  to  do  so.  the  function  definition  was  placed 
in  that  file.  Finally,  the  file  mpl-decLc  "vizs  created,  because  of  its  dependence  on  mpl-lex.h, 
and  also  to  allow  for  an  additional  type  specification  used  in  MPL. 

4.2  Compiling  MPL  Constructs 

MPL  extends  the  C  language  to  allow  for  various  constructs.  In  order  to  implement  these 
extensions,  the  grammar  used  to  recognize  C  in  GCC  had  to  be  extended.  The  following 
are  recognized  as  reserved  words  for  MPL,  in  addition  to  the  standard  reserved  words  for  C: 
shared,  region,  local.region,  module,  in,  out,  in.first,  inJast,  entry,  service,  send,  receive, 
and  opireceive.  The  keywords  in  and  out  were  reserved  words  in  the  c-*  files,  because 
they  are  used  by  Objective  C,  but  in  MPL  they  are  used  as  channel  types.  In  addition  to 
the  new  reserved  words,  rules  were  added  and  modified  resulting  in  the  rules  in  mpl-parse.y. 

4.2.1  Module  Name  Specification 

A  rule  was  added  to  the  grammar  to  parse  the  module  name  specification  in  an  MPL  file. 
The  rule  for  a  whole  program  was  also  modified  to  include  this  module  statement.  This 
rule  expects  the  module  statement  to  appear  before  any  other  definitions.  Since  the  module 


nsjnc  sp6cliics.tion  do6s  not  result  in  any  executable  code,  the  only  action  taken  is  to  record 
the  module  name  given  by  the  programmer. 

4.2.2'  Shared  Buffers 

There  are  no  rules  added  to  the  grammar  for  a  shared  buffer  declaration.  When  a  variable 
declaration  is  parsed,  a  tree  is  created  that  keeps  track  of  all  the  specification  information 
given  for  that  declaration.  For  example,  typedef  and  extern  are  two  of  the  possible  type 
specifications.  The  token  shared  is  recognized  as  a  type  specification,  just  as  typedef  and 
extern  are  recognized.  When  a  declaration  is  made,  these  spedheations  are  processed  in 
the  function  grokdeclaratorQ  in  mpl-decl.c.  When  a  shared  spedfication  is  encountered, 
the  declaration  is  converted  to  a  pointer  to  the  type  specified,  instead  of  just  the  type 
spedfied.  Other  than  this  conversion  to  a  pointer,  the  declaration  is  compiled  just  as  any 
other  dedaration  would  be  compiled  in  C. 

4.2.3  Region  Constructs 

The  region  constructs  are  considered  statements  in  MPL.  Several  rules  were  added  to  parse 
these  constructs,  and  the  region  and  loealjregion  statements  were  added  as  options  for  a 
■valid  statement  in  the  grammar  for  MPL. 

Both  region  and  localjegion  statements  are  compiled  in  the  same  manner.  Each  region 
has  a  name,  and  a  body  which  is  the  code  within  the  critical  section.  In  order  to  protect 
these  critical  sections,  calls  are  made  to  the  Maruti  library  function  marutLeuQ.  When  a 
region  is  parsed,  the  compiler  generates  two  calls  to  marutLeuQ,  in  addition  to  the  code 
in  the  body  of  the  region.  The  first  call  is  generated  just  before  the  body,  and  the  second 
call  just  after.  These  calls  are  generated  through  functions  in  maruti. c.  The  functions  are 
based  on  the  actions  that  would  have  been  taken,  had  the  parser  actually  parsed  the  ralU 
to  marutLeuQ  in  the  source  file. 

4.2.4  Channel  Declairations 

The  rules  added  for  a  channel  declaration  allow  any  number  of  channels  to  be  declared  in 
either  an  entry  or  a  service  function.  Each  channel  declaration  requires  several  pieces  of 
information: 

•  Channel-type 

•  Channel-name 

•  Type  specifier  indicating  the  type  of  data  that  channel  carries 

A  linked  list  of  declared  channels  is  maintained.  For  each  declared  channel  the  following 
information  is  saved: 

•  Channel-name 

•  Type  information 

1.  Size  in  bytes 

2.  String  encoding  the  type  of  the  data 

•  Channel-id 


The  chciTinel^id  is  a  uiucjiie  identification  number  assigned  to  each  declared  channel. 
Channel  declarations  do  not  add  to  the  compiled  code.  The  channels  are  not  allocated 
memory.  The  information  describing  each  channel  is  simple  stored  in  the  linked  list.  During 
compilation,  whenever  a  channel  is  referenced,  the  appropriate  information  is  obtained  from 
this  list. 


4.2.5  Entry  Functions 

Entry  function  definitions  are  compiled  differently  than  other  function  definitions.  An  entry 
function  would  appear  in  an  MPL  file  in  the  following  form: 

entry  < entry _iiaine>  () 

<chaimel_declaration_list  opt> 

{ 

<mpl_f unct ion_body> 

} 


Where  entry jaame  is  an  identifier  that  is  the  name  of  the  entry  function,  the 
clianiieljieclarationJlist.opt  contains  any  channels  the  user  wants  to  define  for  that  func¬ 
tion,  and  mpl.^iinctionJaody  is  any  function  body  that  would  be  accepted  as  a  definition  in 
a  standard  MPL  function.  Semantically  the  entry  function  is  equivklent  to  the  following 
MPL  code: 

_ffianiti_entry_iiaBe  () 

wbile ( 1 ) 

fflaruti„eu() ; 
entry _najne  ()  ; 

> 

> 

entry^name  () 

{ 

mpl.iiinction^body 

> 


An  entry  function  is  compiled  into  two  functions,  as  if  the  two  functions  given  above  had 
been  part  of  the  source  file.  Essentially,  the  first  function  is  just  a  stub  function  that  calls 
maruti^euf),  then  calls  the  second  function  compiled.  As  with  generating  function  calls, 
the  routines  to  generate  the  code  for  entry  function  definitions  are  based  on  the  actions 
that  would  have  been  taken  had  the  parser  actually  parsed  the  code  for  the  two  separate 
functions. 

4.2.6  Service  Phinctions 

Service  functions  definitions  are  handled  very  much  like  entry  function  definitions.  The 
syntax  of  a  service  function  differs  slightly  from  that  of  an  entry  function,  since  it  requires 
that  an  incoming  channel  and  a  message  buffer  be  defined: 


service  <S€rvice_iiaine>  (<in_cliaiiiiel_iiame>  :  <type_specilier>,  <ttsg_t)tr  neuue>) 
<chaiiiiel_declaration  list  opt> 

< 

<npl_f unction  body> 

> 


Like  the  entry  functions,  service  functions  are  semanticaliy  equivalent  to  two  functions, 
where  one  is  simply  a  stub  function  calling  the  second  function  that  is  generated: 

_maruti_service  name  () 

< 

type.specilier  _maTuti_nisg_ptr_naane  ; 

wliile(l) 

{ 

if  (  optreceive  (  .maruti.in  ,  id  .  t  _Jiiaxuti_insg_ptr_name,  size  )  ) 
service_naiiie  (4:  _maruti_msg_ptr_najne  ); 

> 

> 

service_name  (insg_ptr_naine) 

■cyps-specifier  *jnsg  ptr  Ucune; 

< 

iEpl_fTmction_body 

> 


The  servicejiaae,  cbannel^eclarationJList ,  and  mpUunctionJbody  are  all  the  same 
^  described  previously  for  entry'  functions.  In  addition,  service  functions  have  two  other 
items  specified  in  their  definitions.  The  first  is  a  channel.  Every  service  function  requires 
a  channel  be  specified.  This  channel  is  always  declared  as  an  in  channel  with  the  name 
iz-cbaaneljiaae.  The  type  is  given  by  type_specifier  as  if  it  had  been  declared  in  the 
cbannel^eclarationJList .  The  channel  is  used  to  invoke  the  service  function.  This  in 
channel  is  used  by  the  optreceive  in  the  stub  function  that  calls  the  function  containing  the 
service  function  body.  When  a  message  is  received  on  this  channel,  the  service  function  is 
executed.  The  second  additional  item  is  a  message  buffer  used  by  the  service  function.  The 
name  of  this  message  buffer  is  given  by  msg  j)tr  jiaae ,  the  type  is  given  by  type_specifier. 
This  buffer  is  used  to  hold  the  message  received  from  the  client  that  invoked  the  service 
function,  and  is  passed  to  the  second  function  containing  the  body  of  the  service  function. 

4.2.7  Communication  Function  Calls 

There  were  three  library  functions  provided  for  message  passing  mentioned  previously:  send 
receive,  and  optreceive.  Function  calls  to  any  of  these  three  library  functions  are  handled 
differ^tly  than  other  function  calls.  In  the  MPL  grammar,  send,  receive,  and  optreceive 
are  ail  reserved  words.  The  MPL  syntax  for  all  of  these  calls  is  the  Mowing: 

<fuiic  wioii^2iajDe>  (< channel ~jia2ne> ,  <TDaraineter“2>)  ; 


ChaEfiel-aanie  should  be  a  previously  declared  channel,  and  paxajneter-2  should  be  a 
pointer.  These  function  calls  must  be  compiled  differently,  since  these  are  not  the  actual 
parameters  used  when  the  call  is  generated.  In  the  case  of  a  call  to  send,  the  actual 
parameters  must  be  as  follows: 

send  (<cli.annel-id>,  <pajrajneter-2> ,  <channel-size>)  ; 

In  the  case  of  a  call  to  either  receive,  or  optreceive,  the  parameters  required  are: 

receive  1  optreceive  (<ch.aimel-type> ,  <channel-id> ,  <paa:ameter-2> ,  <chaainel-size>)  ; 

The  channel-type  for  a  receive  or  optreceive  call  is  an  integer  generated  by  the  compiler 
that  wiD  indicate  an  in,  in.firsi,  or  inJast  channel. 

When  one  of  these  three  function  calls  are  encountered,  there  are  special  rules  in  the 
grammar  to  handle  it.  A  function  in  maruii.c  is  called  which  generates  the  appropriate 
parameters,  and  then  the  function  call  itself.  These  function  calls  are  generated  as  men¬ 
tioned  above  for  the  calls  to  marutueu().  The  channel-naiBe  specified  by  the  user  is  used 
to  obtain  the  necessary  parameters.  Given  the  channel  name,  the  linked  list  of  channels  is 
searched  to  find  the  corresponding  channel,  then  the  chaimel-id  and  the  chaimel-size  are 
obtained  from  that  node  in  the  linked  list.  There  is  also  some  type  checking  done  at  this 
stage.  The  compiler  verifies  that  only  an  outgoing  channel  is  specified  for  a  send  call,  or  an 
incoming  channel  for  the  receive  and  optreceive  calls.  The  compiler  also  checks  that  any 
channel  referenced  has  been  previously  defined. 

The  grammar  for  MPL  was  modified  so  that  a  call  to  any  of  the  communication  functions 
may  occur  anywhere  that  a  primary  expression  occurs,  since  that  is  where  other  function 
calls  are  permitted  to  occur. 

4.2.8  Initialization  Function 

The  user-defined  function  marutLmain()  is  compiled  as  an  ordinary  C  function. 

5  PEUG  File 

The  source  code  of  an  MPL  file  is  broken  up  into  elemental  units.  Each  elemental  unit 
identifies  the  resources  that  it  requires.  These  elemental  units  are  used  later  in  the  develop¬ 
ment  process  for  scheduling  the  application.  The  output  file  created  by  the  MPL  compiler 
creates  a  Partial  Elemental  Unit  Graph  (PEUG)  for  the  given  source  file.  The  name  of  this 
file  is  the  name  of  the  source  file,  with  the  mpl  extension  replaced  by  an  eu  extension. 
There  are  several  different  types  of  information  recorded  in  this  PEUG  file. 

5.1  Module  Name 

The  first  line  in  the  output  file  indicates  the  name  of  the  module,  and  will  appear  as: 
peug  <inodule-naae> 

The  module-name  is  taken  directly  from  the  module  name  specification  given  in  the  MPL 
source  file. 


5.2  File  Name 


The  second  line  in  the  source  file  indicates  the  name  of  the  target  file  that  is  created  by  the 
compiler,  where  file-name  is  the  target: 

lile  <iile“ii2ane> 


5.3  Shared  Buffers 

Each  time  a  shared  buffer  is  declared  its  name  and  type  information  is  recorded  to  the 
output  file: 

shared  <shared-buffer-naBe>  :  (type-description-string> ,  <type-size>) 

The  type-description-string  and  type-size  of  a  shared  buffer  is  obtained  from  the 
type  specification,  and  is  represented  in  the  same  manner  as  the  type  and  size  for  a  chan¬ 
nel.  ■  Although  the  shared  buffer  is  actually  a  pointer  to  the  type  it  is  declared  as,  the 
type-description-string  represents  the  object  being  pointed  to,  and  not  the  pointer  itself. 

5.4  Entry,  Service,  and  User  Function  Definitions 

In  MPL,  a  user  may  define  ordinary  functions  in  addition  to  the  entry  and  service  functions 
that  are  permitted  in  MPL.  For  each  entry,  service  or  ordinary  user-defined  function,  there 
is  an  entry  in  the  output  file.  This  entry  has  the  following  format: 

<fTUiction-type>  <iuiiction-name> 


size  <stack-size> 

Ftmctien-type  can  be  either  function,  entry,  or  service,  indicating  which  type  of  function 
is  being  defined.  Function-iiame  is  the  declared  name  of  the  function  in  the  source  file. 
Szaci-size  is  the  maximum  stack  size  needed  by  this  function.  This  szack-size  includes  the 
arguments  pushed  onto  the  stack  preceding  any  function  calls  occuring  within  the  function 
body.  There  will  also  be  other  information  concerning  the  body  of  the  function  that  wiH 
appear  between  the  lunczion-naone,  and  the  stack-size.  The  entry  for  the  maniti^mainQ 
function  will  be  the  same  as  those  for  other  user  defined  functions.  Entry  and  service 
functions  will  contain  some  additional  information  not  applicable  to  ordinary  functions 
that  will  be  described  below. 

5.4.1  Channels 

For  each  channel  that  is  declared,  a  description  of  the  channel  is  written  to  the  output  file. 
These  descriptions  will  occur  right  after  the  statement  indicating  the  name  of  the  current 
function: 


<clia2mel-type>  <name>  :  (<description-string> ,  <size>) 


The  channel-type  and  channel-najne  will  be  the  type  and  name  specified  in  the  source 
file.  The  description-string  and  size  are  based  on  the  type  specification  in  the  channel 
declaration.  Channel  descriptions  will  occur  only  in  entry  and  service  functions.  A  service 
function  will  always  contain  at  least  one  channel  description,  since  the  syntax  of  a  service 
function  reqmres  a  channel  be  named  in  the  definition.  A  channel  description  will  also  be 
output  for  every  send,  receive,  and  optreceive  call,  since  these  calls  require  a  channel  as  one 
of  their  parameters. 

5.4.2  Function  Calls 

Each  time  a  function  call  is  parsed,  there  will  be  a  line  in  the  output  file: 
calls  <functioii-iiame>  {in_cond3-  {in.loop} 

This  line  indicates  where  a  function  call  occurs,  and  which  function  is  being  called.  The 
in.cond  and  inJLoop  indicate  if  this  function  call  appears  within  a  conditional  statement  or 
within  a  loop.  These  labels  wU]  be  seen  only  if  their  respective  conditions  are  true. 

5.4.3  Communication  Function  Calls 

Any  call  to  a  communication  function  is  recorded  similarly  to  other  function  calls.  There  is 
a  line  indicating  the  name  of  the  function,  as  shown  above  for  a  function  call.  In  addition, 
there  will  be  aline  describing  the  channel  associated  with  that  communication  function  call. 
This  line  will  appear  just  as  the  line  for  the  channel  definition  described  above  appears. 

5.4.4  EU  Boundaries 

The  output  file  for  an  MPL  source  file  indicates  where  each  elemental  unit  (EU)  begins  by 
the  following: 

eu  <R>  •fregion.listl 

The  N  indicates  an  EU  number.  Each  EU  within  a  source  file  has  a  unique  number. 
There  are  several  places  where  EU  boundaries  are  created: 

•  Start  of  a  function 

•  Start  of  a  region 

•  End  of  a  region 

•  Explicit  calls  to  maruti-eu() 

The  initial  EU  occuring  at  the  beginning  of  a  function  that  is  not  a  service  or  entry  function 
is  a  special  case.  This  is  always  labeled  as  “eu  0”  in  the  output  file,  and  does  not  represent 
an  actual  EU. 

Each  EU  may  also  be  followed  by  a  list  describing  one  or  more  re^ons.  This  list 
represents  the  regions  that  this  EU  occurs  within.  The  description  of  a  region  appears  as: 

(region-ncune  instance  access  type) 

The  region-name  is  just  that  given  by  the  user,  and  the  type  indicates  if  a  region  is  local 
(local_region  construct)  or  global  (region  construct).  The  access  indicates  if  the  access  is 
read  or  w'rite.  The  instance  indicates  the  instance  of  this  region  within  the  source  file. 
Each  instance  for  a  region  within  a  source  file  is  unique. 


( 


6  Conclusions 


Ss^ing  MPL  on  C  hss  simplified  the  development  of  both  the  l&iigu2.ge  ajid  its  compiler. 
The  language  is  easy  to  learn  for  any  programmer  that  has  used  C  before,  since  there 
are  a  limited  number  of  additional  constructs  unique  to  MPL.  Using  the  GCC  C  source 
code  provided  an  existing  compiler,  rather  than  implementing  a  new  one.  The  source  code 
for  GCC  only  needed  to  be  modified  to  handle  some  additional  constructs,  and  produce 
some  additional  output.  This  made  the  implementation  fsdrly  simple.  However,  the  GCC 
C  compiler  also  provides  some  functionality  that  is  not  needed  by  MPL.  Much  of  this 
functionality  provided  is  not  even  permitted.  These  restrictions  are  not  enforced  by  the 
compiler,  but  should  be  detected  within  the  development  cycle. 

Prior  to  the  development  of  the  MPL  compiler  using  GCC,  compiling  an  MPL  source 
file  required  two  steps.  The  source  files  were  initially  passed  through  a  pre-compiler  to 
extract  the  available  resource  information  and  parse  the  MPL  constructs.  The  pre-compiler 
was  responsible  for  converting  the  MPL  code  into  valid  C  code,  which  was  then  compiled 
using  a  standard  C  compUer.  The  new  implementation  of  the  compiler  eliminates  some 
of  the  redundant  processing  that  is  done  when  the  pre-compiler  is  used.  The  information 
obtained  through  the  pre-compiler  already  existed  in  the  internal  structure  used  by  the  GCC 
compiler.  This  information  just  needed  to  be  recorded.  Instead  of  parsing  source  code  files 
in  the  two  steps  independently,  the  functionality  of  the  pre-compiler  has  been  incorporated 
into  the  compiler  itself.  The  MPL  compiler  provides  a  single  tool  that  extracts  all  the 
a\’ailable  information  at  the  initial  stage  of  develpment. 

In  the  future,  a  version  of  MPL  may  be  implemented  that  is  based  on  the  Ada  pro¬ 
gramming  language.  GNAT  is  a  compiler  for  Ada  9X  that  is  being  developed  at  NYU. 
GNAT  depends  on  the  backend  of  the  GCC  compiler.  Using  the  source  code  for  GNAT, 
an  implementation  of  MPL  based  on  Ada  would  be  similar  to  the  current  implementation 
based  on  C. 


Appendix 


A  MPL  File 

The  following  is  a  sample  of  MPL  source  code: 

module  timer; 

typedef  struct  { 
int  seconds; 
int  minutes; 
int  hours; 

}  time_type; 

shared  time.type  global^time; 

maruti_main(argc,  argv) 
int  argc; 
char  ♦^axgv; 

•C 

global_time->seconds  =  0; 
global_time‘->minutes  =  0; 
global_time~>hours  =  0; 

return  0; 

> 

entry  update_second() 
out  disn  :  time^type; 

-c 

time.type  msg; 

region  time^region  i 

global_time“>seconds++ ; 
il  (global_tijiie“>seconds  ==  60) 
globaI_tiffie~>seconds  =  0; 
ttsg  =  ^global^time; 

> 

send  (disp,  tasg) ; 

} 

entry  update_ainute() 
out  display  :  time^type; 

time^type  msg; 

region  time_region  i 

global  ime->minut  es ++ ; 


ii  (global_tiffie-“>miiiutes  ==  60) 
global _tiae->minutes  =  0; 
msg  =  »global_tiine; 

> 

send  (display,  tansg) ; 

> 

entry  npdate_honr() 
out  display  :  time^type; 

{ 

time.type  msg; 

region  time.region  { 
global_tiine->hours++ ; 
ii  (global_tiine“>liours  ==  24) 
globeLl.tiine“>liours  =  0; 
asg  =  *globaI_tijne; 

> 

send  (display,  tansg) ; 

> 

service  display_tixDe(incban  :  time  type,  time) 

{ 

printi C'Current  Time:  Xd  :  7,d  :  7d»‘,  time->bours ,  time->minutes ,  tim€->seconds)  ‘ 

} 


/ 


B  PEUG  File 

The  corresponding  PEUG  file  for  the  source  code  above  is 

peug  timer 
tile  timer. o 

shared  global^time  :  ($(iii),  12) 
iimction  maruti^main 
eu  0 
size  4 

entry  update^ second 

out  disp  :  ($(iii),  12) 
eu  2 

eu  3  (time^region  1  V  global) 
calls  maruti^eu 
eu  4 

calls  maruti_eu 
calls  send 

out  disp  :  ($(iii),  12) 

size  32 

entry  update ^minute 

out  display  :  ($(iii),  12) 
eu  5 

eu  6  (time_region  2  V  global) 
calls  maruti^eu 

eu  7 

calls  maruti.eu 
calls  send 

out  display  :  ($(iii),  12) 

size  32 

entry  update^hour 

out  display  :  ($(iii),  12) 
eu  8 

eu  9  (time_region  3  V  global) 
calls  maruti^eu 

eu  10 

calls  maruti^eu 
calls  send 

out  display  :  ($(iii),  12) 

size  32 

service  display.time 

in  inchan  :  ($(iii),  12) 
eu  11 

calls  optreceive 
in  inchan  :  ($(iii),  12) 
caCLls  print! 


size  52 
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Abstract 

Allocation  problem  has  always  been  one  of  the  fundamental  issues  of  building  the  applica¬ 
tions  in  distributed  computing  systems  (DCS).  For  real-time  applications  on  DCS,  the  allocation 
problem  should  directly  address  the  issues  of  task  and  communication  scheduling.  In  this  con¬ 
text,  the  allocation  of  tasks  has  to  fully  utilize  the  available  processors  and  the  scheduling 
of  tasks  has  to  meet  the  specified  timing  constraints.  Clearly,  the  execution  of  tasks  under 
the  allocation  and  schedule  has  to  satisfy  the  precedence,  resources,  and  other  synchronization 
constraints  among  them. 

Recently,  the  timing  requirements  of  the  real-time  systems  emerge  that  the  relative  timing 
constraints  are  imposed  on  the  consecutive  executions  of  each  task  and  the  inter- task  temporcd 
relationships  are  specified  across  task  periods.  In  this  paper  we  consider  the  allocation  and 
scheduling  problem  of  the  periodic  tasks  with  such  timing  requirements.  Given  a  set  of  periodic 
tasks,  we  consider  the  least  common  multiple  (LCM)  of  the  task  periods.  Each  task  is  extended 
to  several  instances  within  the  LCM.  The  scheduling  window  for  each  task  instance  is  derived  to 
satisfy  the  timing  constraints.  We  develop  a  simulated  annealing  algorithm  as  the  overall  control 
algorithm.  An  example  problem  of  the  sanitized  version  of  the  Boeing  777  Aircraft  Information 
Management  System  is  solved  by  the  algorithm.  Experimental  results  show  that  the  algorithm 
solves  the  problem  in  a  reasonable  time  complexity. 
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1  Introduction 


The  task  allocation  and  scheduling  problem  is  one  of  the  basic  issues  of  building  real-time  ap¬ 
plications  on  a  distributed  computing  system  (DCS).  DCS  is  typically  modeled  as  a  collection  of 
processors  interconnected  by  a  communication  network.  For  hard  real-time  applications,  the  allo¬ 
cation  of  tasks  over  DCS  is  to  fully  utilize  the  available  processors  and  the  scheduling  is  to  meet 
their  timing  constraints.  Failure  to  meet  the  specified  timing  constraints  or  inability  to  respond 
correctly  can  result  in  disastrous  consequence. 


For  the  hard  real-time  applications,  such  as  avionics  systems  and  nuclear  power  systems,  the 
approach  to  guarantee  the  critical  timing  constraints  is  to  allocate  and  schedule  tasks  a  priori. 
The  essential  solution  is  to  find  an  static  allocation  in  which  there  exists  a  feasible  schedule  for  the 
given  task  sets.  Ramamritham  [Ram90]  proposes  a  global  view  where  the  purpose  of  allocation 
should  directly  address  the  schedulability  of  processors  and  communication  network.  A  heuristic 
approach  is  taken  to  determine  an  allocation  and  find  a  feasible  schedule  under  the  allocation. 
Tindell  et  al.  [TBW92]  take  the  same  global  view  and  exploit  a  simulated  annealing  technique 
to  allocate  periodic  tasks.  A  distributed  rate-monotonic  scheduling  algorithm  is  implemented.  In 
each  period  a  task  must  execute  once  before  the  specified  deadline.  The  transmission  times  for 
the  communications  are  taken  into  account  by  subtracting  the  total  communication  time  from  the 
deadline  and  making  the  execution  of  the  task  more  stringent. 

Simply  assuring  that  one  instance  of  each  task  starts  after  the  ready  time  and  completes  before 
the  specified  deadline  is  not  enough.  Some  real-time  applications  have  more  complicated  timing 
constraints  for  the  tasks.  For  example,  the  relative  timing  constraints  may  be  imposed  upon 
the  consecutive  executions  of  a  task  in  which  the  scheduling  of  two  consecutive  executions  of  a 
periodic  task  must  be  separated  by  a  minimum  execution  interval.  Communication  latency  <*811  be 
specified  to  make  sure  that  the  time  difference  between  the  completion  of  the  sending  task  and  the 
start  of  the  receiving  task  does  not  exceed  the  specified  value.  The  Boeing  777  Aircraft  Information 
Management  System  is  such  an  example  [CDHC94].  For  such  applications,  the  algorithms  proposed 
in  literature  do  not  work  because  the  timing  constraints  are  imposed  across  the  periods  of  tasks.  In 
this  paper,  we  consider  the  relative  timing  constraints  for  real  examples  of  real-time  applications 
in  Section  2.  Based  on  the  task  characteristics,  we  propose  the  approach  to  allocate  and  schedule 
these  applications  in  Section  3.  A  simulated  annealing  algorithm  is  developed  to  solve  the  problem 
in  which  the  reduction  on  the  search  space  is  given  in  Section  4.  in  Section  5,  we  evaluate  the 
practicality  and  show  the  significance  of  the  algorithm.  Instead  of  randomly  generating  the  ad  hoc 
test  cases,  we  apply  the  algorithm  to  a  real  example.  The  example  is  the  Boeing  777  AIMS  with 
various  numbers  of  processors.  The  experimental  results  are  shown  in  Section  5. 


2  Problem  Description 


Vaxious  kinds  of  periodic  task  models  have  been  proposed  to  represent  the  real-time  system  char¬ 
acteristics.  One  of  them  is  to  model  an  application  as  an  independent  set  of  tasks,  in  which  each 
ta^k  is  executed  once  every  period  under  the  ready  time  and  deadline  constraints.  Synchronization 
(e.g.  precedence  and  mutual  exclusion)  and  communications  are  simply  ignored.  Another  model 
to  take  the  precedence  relationship  and  communications  into  account  is  to  model  the  application 
as  a  task  graph.  In  a  task  graph,  tasks  axe  represented  as  nodes  while  communications  and  prece¬ 
dence  relationship  between  tasks  are  represented  as  edges.  The  absolute  timing  constraints  can 
be  imposed  on  the  tasks.  Tasks  have  to  be  allocated  and  scheduled  to  meet  their  ready  time  and 
deadline  constraints  upon  the  presence  of  synchronization  and  communications.  The  deficiency 
of  task  graph  modeling  is  inability  of  specifying  the  relative  constraints  across  task  periods.  For 
example,  one  can  not  specify  the  minimum  separation  interval  between  two  consecutive  executions 
of  the  same  task. 

In  the  work  [CA93],  we  modified  the  real-time  system  characteristics  by  taking  into  account 
the  relative  constraints  on  the  instances  of  a  task.  We  considered  the  scheduling  problem  of  the 
periodic  tasks  with  the  relative  timing  constraints.  We  analyzed  the  timing  constraints  and  derive 
the  scheduling  window  for  each  task  instance.  Based  on  the  scheduling  window,  we  presented 
the  time-based  approach  of  scheduling  a  task  instance.  The  task  instances  are  scheduled  one  by 
one  based  on  their  priorities  assigned  by  the  proposed  algorithms.  In  this  paper  we  augment  the 
real-time  system  characteristics  by  considering  the  inter-task  communication  on  DCS. 

2.1  Task  Characteristics 

The  problem  considered  in  this  chapter  has  the  following  characteristics. 

•  The  Fundamentals;  A  task  is  denoted  by  the  4-tuple  <  p,-,  e,-,  A,-,  77, ■  >  denoting  the  period, 
computation  time,  low  jitter  and  high  jitter  respectively.  One  instance  of  a  task  is  executed 
each  period.  The  execution  of  a  task  instance  is  non-preemptable.  The  start  times  of  two 

consecutive  instances  of  task  r,-  are  at  least  p,-  -  A,-  and  at  most  p;  -I-  t?.-  apart.  Let  and 

//  be  the  start  time  and  finish  time  of  task  instance  rf  respectively.  The  timing  constraints 
specified  in  Equations  1  through  4  must  be  satisfied. 


//  =  4  + 

5^+^  =  sl-fLCM 

^  ^  5;  "  *r  Pi  ~  Ai 


(1) 

(2) 

(3) 


(4) 


^  +  Pi  +  Pi 
Vj  =  2, ....  71,'  +  1. 

•  Asynchronous  Communication:  Tasks  communicate  with  each  others  by  sending  and 
receiving  data  or  messages.  The  frequencies  of  sending  and  receiving  tasks  of  a  communication 
can  be  different.  In  consequence,  communications  between  tasks  may  cross  the  task  periods. 
When  such  asynchronous  communications  occur,  the  semantics  of  undersampling  is  assumed. 
When  two  tasks  of  different  frequencies  are  communicating,  schedule  the  message  only  at 
the  lower  rate.  For  example,  if  task  A  (of  lOHZ)  sends  a  message  to  task  B  (of  5HZ),  then 
in  every  ZOOms,  one  of  two  instances  of  task  A  has  to  send  a  message  to  one  instance  of 
task  B.  If  the  sending  and  receiving  tasks  are  assigned  to  the  same  processor,  then  a  local 
communication  occurs.  We  cissume  the  time  taken  by  a  local  communication  is  negligible. 
When  an  interprocessor  communication  (IPC)  occurs,  the  communication  must  be  scheduled 
on  the  communications  network  between  the  end  of  the  sending  task  execution  and  the  start 
of  the  receiving  task  execution.  The  transmission  time  required  to  communicate  the  message 
i  over  the  network  is  denoted  by  /i,-. 

•  Communication  Latency:  Each  communication  is  associated  with  a  communication  la¬ 
tency  which  specifies  the  maximum  separation  between  the  start  time  of  the  sending  task  and 
the  completion  time  of  the  receiving  task. 

•  Cyclic  Dependency:  Research  on  the  allocation  problem  has  usually  focused  on  acyclic 
task  graphs  [Ram90,  HS92].  Given  an  acyclic  task  graph  G  =  {KE^},  if  the  edge  from  task 
A  to  task  B  is  in  E  then  the  eage  from  B  to  A  can  not  be  in  E.  The  use  of  acyclic  task 
graphs  excludes  the  possibility  of  specifying  the  cyclic  dependency  among  tasks.  For  example, 
consider  the  following  situation  in  which  one  instance  of  task  A  can  not  start  its  execution 
until  it  receives  data  from  the  last  instance  of  task  B.  After  the  instance  of  task  A  finished 
its  execution,  it  sends  data  to  the  next  instance  of  task  B.  Since  tasks  A  and  B  are  periodic, 
the  communication  pattern  goes  on  throughout  the  lifetime  of  the  application.  To  be  able  to 
accommodate  this  situation,  we  take  cyclic  dependency  into  consideration. 

The  timing  constraints  described  above  are  shown  in  Figure  1.  For  periodic  tasks  A  and  B,  the 
start  times  of  each  and  every  instance  of  task  execution  and  communication  are  pre-scheduled  such 
that  (1)  the  execution  intervals  fall  into  the  range  between  p  —  X  and  p  +  77  and  (2)  the  time  window 
between  the  start  time  of  sending  task  and  the  completion  time  of  receiving  task  is  less  than  the 
latency  of  the  commumcation.  In  Figure  2,  we  illustrate  examples  of  all  possible  communication 
patterns  considered  in  this  paper.  The  description  of  the  communications  in  the  task  system  is  in 
the  form  of  “From  sender-task-id  (of  frequency)  To  receiver-task-id  (of  frequency)" .  If  the  sender 
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Figure  1:  Relative  Timing  Constraints 


frequency  is  n  times  of  the  receiver  frequency  and  no  cyclic  dependency  is  involved,  then  one 
of  every  n  instances  of  the  sending  task  has  to  communicate  with  one  instance  of  the  receiving 
task.  (Examples  of  this  situation  are  shown  in  Figures  2.a.l  and  2.a.2.  Likewise,  for  the  case  in 
w'hich  the  receiver  frequency  is  n  time  that  of  the  sender  frequency  and  no  cyclic  dependency  is 
present,  the  patterns  are  showm  in  Figures  2.b.l  and  2.b.2.  For  an  asynchronous  communication,  the 
sending  (receiving)  task  in  low  frequency  sends  (receives)  the  message  to  (from)  the  nearest  receiving 
(sending)  task  as  shown  in  Figure  2. a  (2.b).  The  cases  where  cyclic  dependency  is  considered  are 
shown  in  Figures  2.c  and  2.d. 


2.2  System  Model 

A  real-time  DCS  consists  of  a  number  of  processors  connected  together  b\’  a  communications 
network.  The  execution  of  an  instance  on  a  processor  is  nonpreemptable.  To  provide  predictable 
communication  and  to  avoid  contention  for  the  communication  channel  at  the  run  time,  we  make  the 
following  assumptions.  (1)  Each  IPC  occurs  at  the  pre-scheduled  time  as  the  schedule  is  generated. 
(^2)  At  most  one  communication  can  occur  at  any  given  lime  on  the  network. 


(a.2) 

From  A  (of  lOHZ)  to  B  (of  5HZ) 


200  ms 

(c) 

From  A  (of  lOHZ)  to  B  (of  5HZ) 
From  B  (of  5HZ)  to  A  (of  lOHZ) 


(b.2) 

From  A  (of  5HZ)  to  B  (of  lOEZ) 


200  ms  ' 

(d) 

From  A  (of  lOHZ)  to  B  (of  lOHZ) 
From  B  (of  lOHZ)  to  A  (of  lOHZ) 


Figure  2:  Possible  Communication  Patterns 


2.3  Problem  Formulation 


Wq  consider  the  static  assignment  and  scheduling  in  which  a  task  is  the  finest  granularity  object 
of  assignment  and  an  instance  is  the  unit  of  scheduling.  We  applied  the  simulated  annealing 
algorithm  [FGV83]  to  solve  the  problem  of  real-time  periodic  task  assignment  and  scheduling  with 
hybrid  timing  constraints.  In  order  to  make  the  execution  of  instances  satisfy  the  specifications 
and  meet  the  timing  constraints,  we  consider  a  scheduling  frame  whose  length  is  the  least  common 
multiple  (LCM)  of  all  periods  of  tasks.  Given  a  task  set  F  and  its  communications  C,  we  construct 
a  set  of  task  instances,  /,  and  a  set  of  multiple  communications,  M.  We  extend  each  task  r,-  6  F 
to  TLi  instances,  and  t"‘.  These  n,-  instances  are  added  to  I.  Each  communication  Ti  ^ 

Tj  e  C  is  extended  to  min(n,-,nj)’  undersampled  communications  where  n,-  =  LCM/p,-  and  Uj  = 
LCU/pj.  These  multiple  communications  are  added  to  M.  The  extension  can  be  stated  as  follows. 

•  If  n,-  <  Uj,  then  r,-  tj  is  extended  to  r/  ry ,  rf  ry ,  . . .,  and  r"’’  ry . 

•  If  n,-  >  Uj,  then  r,-  1-4  tj  is  extended  to  ry  t-r  rj,  ry  t-r  rj,  . . .,  and  ry 

•  If  n,-  =  Uj,  then  r,-  Tj  is  extended  to  r,^  h--  rJ,  r?  t-i-  r?,  . . ..,  and  rf‘'  i-i-  r”^. 

A  task  ID  with  a  superscript  of  question  mark  indicates  some  instance  of  the  task.  For  example, 

‘i  ^  '  j  Dieans  that  r,^  communicates  with  some  instance  of  Tj.  We  describe  how  we  assign  the 
nearest  instance  for  each  communication  in  Section  4.1.2. 

The  problem  can  be  formulated  as  follows.  Given  a  set  of  task  instance,  J,  its  communications 
M,  we  find  an  assignment  o,  a  total  ordering  Cn  of  all  instances,  and  a  total  ordering  Cc  of  all 
communications  to  minimize 


Oto,  Oc)  —  ^2  ~  ~  ^  —  •S;  —  Pi  —  rji) 

i,j  ij 

+  E  <(//  -  4)  +  E  -  «L.  <’.)  -  4) 

+  -4-  Latency  (r,'  to  r^)) 

subject  to  sj  >  rj  and  5(tf  4,0^)  >  //,  V  tj  ^  ti, 


(5) 


where 


^Due  to  undersampling,  when  an  asynchronous  communication  is  extended  to  multiple  communications,  the 
number  of  multiple  communications  is  the  smaller  number  of  sender  and  receiver  instances. 


•  is  the  start  time  of  7/  under 

•  //  is  the  completion  time  of  rf  under  <7^. 

•  =  Pi  X  (i  -  1)  +  r,-,  and  ^  =  p,-  x  (j  -  1)  +  d,-. 

•  <5(x)  =  0,  if  s  <  0;  and  =  x,  if  i  >  0. 

•  <p{t{)  is  the  ID  of  processor  which  r,-  is  assigned  to. 

•  rf  ^  is  the  communication  from  7f  to  If  =  <^(7*),  then  r/  tI  is  a  local 
communication. 

•  ‘5'(c,  Oc)  is  the  start  time  of  communication  c  on  the  network  under 

•  is  the  completion  time  of  communication  c  on  the  network  under 

The  minimum  value  of  E(0,a„.,ac)  is  zero.  It  occurs  when  the  executions  of  all  instances 
meet  the  jitter  constraints  and  all  communications  meet  their  latencj'  constraints.  A  feasible 
multiprocessor  schedule  can  be  obtained  by  collecting  the  values  of  4  and  //,  V  i  and  j.  Likewise, 
a  feasible  network  schedule  can  be  obtained  from  5(c,  ac)s  and  F{c,  Ocjs. 

Since  the  task  system  is  a.synchronous  and  the  communication  pattern  could  be  in  the  form  of 
cyclic  dependency,  we  solve  the  problem  of  finoing  a  feasible  solution  (®,  <7,,,,  cTj)  by  exploiting  the 
cyclic  scheduling  technique  and  embedding  the  technique  into  the  simulated  annealing  algorithm. 

3  The  Approach 

3.1  Bounds  of  a  Scheduling  Window 

Define  the  scheduling  window  for  a  task  instance  as  the  time  interval  during  which  the  task  can 
start.  Traditionally,  the  lower  and  upper  bounds  of  the  scheduling  window  for  a  task  instance  are 
called  earliest  start  time  (est)  and  latest  start  time  {1st)  respectively.  These  values  are  given  and 
independent  of  the  start  times  of  the  preceding  instances. 

We  consider  the  scheduling  of  periodic  tasks  with  relative  timing  constraints  described  in  Eoua- 
tions  3  and  4.  The  scheduling  window  for  a  task  instance  is  derived  from  the  start  times  of  its 
preceding  instances.  A  feasible  scheduling  window  for  a  task  instance  r/  is  a  scheduling  window 
in  w'hich  any  start  time  in  the  window  makes  the  timing  relation  between  s{~^  and  si  satisfy 


Equations  3  and  4.  Formally,  given  and  . . \  the  problem  is  to  derive  the  feasible 

scheduling  window  for  r/  such  that  a  feasible  schedule  can  be  obtained  if  r/  is  scheduled  within 
the  window. 

Proposition  1  [CA93]:  Let  the  est  and  1st  of  rf  be 


est{Ti) 
and  lst{r^) 


=  max{{s\  ^  +  p,-  -  A.),  {s}  +  (j  -  1)  x  p,-  -  (n,-  -  j  +  1)  x  r;,)}, 
=  +  Pi  +  Tji),  (si  +  (jf  -  1)  X  Pi  +  (n,-  -  j  +  1)  x  A,)}. 
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If  si  is  in  between  the  est(r/)  and  /st(r/),  then  the  estimated  est  and  1st  of  sf',  based  on  sj  and 
s”*'*’^,  specify  a  feasible  window. 

3.2  Cyclic  Scheduling  Technique 

The  basic  approach  of  scheduling  a  set  of  synchronous  periodic  tcisks  is  to  consider  the  execution 
of  all  instances  within  the  scheduling  frame  whose  length  is  the  LCM  of  aU  periods.  The  release 
times  of  the  first  periods  of  all  tasks  are  zero.  As  long  as  one  instance  is  scheduled  in  each  period 
within  the  frame  and  these  executions  meet  the  timing  constraints,  a  feasible  schedule  is  obtained. 
In  a  feasible  schedule,  all  instances  complete  the  executions  before  the  LCM. 

On  the  other  hand,  in  asynchronous  task  systems,  as  depicted  in  Figure  2  in  which  the  LCM 
is  200ms,  the  periods  of  the  two  tasks  axe  out  of  phase.  It  is  possible  that  the  completion  time 
of  some  instance  in  a  feasible  schedule  exceeds  the  LCM.  To  find  a  feasible  schedule  for  such  an 
asynchronous  system,  a  technique  of  handling  the  time  value  which  exceeds  the  LCM  is  proposed. 

'  The  technique  is  based  on  the  linked  list  structure  described  in  the  work  [CA93].  Without  loss 
of  generality,  we  assume  the  minimum  release  time  among  the  first  periods  of  all  tasks  is  zero.  We 
keep  a  linked  list  for  each  processor  and  a  separated  list  for  the  communication  network.  Each 
element  in  the  list  represents  a  time  slot  assigned  to  some  instance  or  communication.  The  fields  of 
a  time  slot  of  some  processor  p:  (1)  task  id  i  and  instance  id  j  indicate  the  identifier  of  the  time  slot. 

(2)  start  time  st  and  finish  time  ft  indicate  the  start  time  and  completion  time  of  Tf  respectively. 

(3)  prev  ptr  and  next  ptr  are  the  pointers  to  the  preceding  and  succeeding  time  slots  respectively. 
The  list  is  arranged  in  an  increasing  order  of  start-time.  Any  two  time  slots  are  nonoverlapping. 
Since  the  execution  of  an  instance  is  nonpreemptable,  the  time  difference  between  start-time  and 
finish-time  equals  the  execution  time  of  the  task. 


Before: 


After: 


Figure  3:  Insertion  of  a  new  time  slot 


3.2.1  Recurrence 

Given  any  solution  point  (®,  ac),  we  construct  the  schedule  by  inserting  time  slots  to  the  linked 
lists.  Let  Cm-  task-id  x  instance-id  — >•  integer.  The  insertion  of  a  time  slot  for  rf  precedes  that  for 
rjf  if  a^(Tf)  <  <Jmi4y 

Recall  that  Equations  6  and  7  specify  the  bounds  of  the  scheduling  window  for  a  tatsk  instance. 
Due  to  the  communications,  est^r^)  in  Equation  6  may  not  be  the  earliest  time  for  We  define 

the  effective  start  time  as  the  time  when  (1)  the  hybrid  constraints  are  satisfied  and  (2)  rf  receives 
all  necessary  data  or  messages  from  all  the  senders. 

Given  the  effective  start  time  r  and  the  assignment  of  r,*  (i.e.  p  =  a  time  slot  of  processor 

p  is  assigned  to  rf  where  startJtime  >  r  and  finishjtime  -  starijtime  =  e,-.  Note  that  we  have 
to  make  sure  the  new  time  slot  does  not  overlap  existent  time  slots.  Since  (1)  the  executions  of 
all  instances  within  one  scheduling  frame  recur  in  the  next  scheduling  frame  and  (2)  it  is  possible 
that  the  time  slot  for  some  instance  is  over  LCM,  we  subtract  one  LCM  from  the  starijtime  or 
finish  Jtime  if  it  is  greater  than  LCM.  It  means  the  time  slot  for  this  task  instance  wiU  be  modulated 
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and  wrapped  to  the  beginning  of  the  schedule.  As  shown  in  Figure  3  The  stariJtime  of  the  new 
slot  is  r  while  the  completion  time  is  r  +  e—LCM. 

3.3  Pseudo  Instances 

As  stated  in  Section  2,  we  consider  the  communication  pattern  in  which  cyclic  dependency  exists 
among  tasks.  Given  a  set  of  tasks,  F,  a  set  of  task  instances.  /,  a  set  of  communications,  C,  and 
any  solution  point,  (d,  we  introduce  pseudo  instances  to  solve  this  problem.  For  any  task 

r.,  if  there  exists  a  task  in  which  (1)  <7m{ri)  <  V  (2)  =  riy,  and  (3)  € 

C  and  Ty  £  (7,  then  a  pseudo  instance  is  added  to  I.  A  pseudo  instance  is  always  a 

receiving  instance.  No  insertion  of  time  slots  for  pseudo  instances  is  needed.  For  a  pseudo  instance, 
only  the  effective  start  time  is  concerned.  The  effective  start  time  of  a  pseudo  instance  in 

the  constructed  schedule  based  on  (d,  C7c)  is  checked  to  see  whether  it  is  less  than  LCM  -r  or 
not.  If  yes,  then  the  execution  of  rj  for  the  next  scheduling  frame  may  start  at  LCM  +  si  which 
is  exactly  one  LCM  away  from  the  execution  of  tI  for  the  current  scheduling  frame.  A  graphical 
illustration  of  the  introduction  of  pseudo  instance  to  solve  the  synchronous  communications  of 
cyclic  dependency  is  given  in  Figure  4  in  which  =  2. 

As  for  the  asynchronous  communications  of  cyclic  dependency,  no  pseudo  instances  are  needed. 
For  example,  if  both  ^  2nd  Ty  ^  exist  and  =:  riy  x  n,  then  for  each  r^,  where  j  =  1, 
2,  ...,  find  a  sending  instance  £  I  and  a  receiving  instance  Q  I  such  that  (1)  fl  < 

(2)  and  (3)  ^  and  ^  are  the  communications.  The  relationship  between  i,  j, 


Figure  5:  Asynchronous  communications  in  mutuality 


and  k  can  be  stated  as 

{j-l)xn<i<k<jxn.  (8) 

A  graphical  illustration  can  be  found  in  Figure  5.  In  the  example,  the  values  of  z,  jf,  and  n  are 
6,  2,  8,  4  respectively.  The  communications  ^  Ty  and  Ty  are  scheduled  before  and  after 

the  scheduling  of  Ty  respectively. 

4  The  Simulated  Annealing  Algorithm 

Kirkpatrick  et  al  [KGV83]  proposed  a  simulated  annealing  algorithm  for  combinatorial  optimiza* 
tion  problems.  Simulated  annealing  is  a  global  optimization  technique.  It  is  derived  from  the 
observation  that  an  optimization  problem  can  be  identified  with  a  fluid.  There  exists  an  analogy 
between  finding  an  optimal  solution  of  a  combinatorial  problem  with  many  variables  and  the  slow 
cooling  of  a  molten  metal  until  it  reaches  its  low  energy  ground  state.  Hence,  the  terms  about 
energy  function,  temperature,  and  thermal  equilibrium  are  mostly  used.  During  the  search  of  an 
optimal  solution,  the  algorithm  always  accepts  the  downward  moves  from  the  current  solution  point 
to  the  points  of  lower  energy  values,  while  there  is  still  a  small  chance  of  accepting  upward  moves 
to  the  points  of  higher  energy  values.  The  probability  of  accepting  an  uphill  move  is  a  function  of 
current  temperature.  The  purpose  of  hill  climbing  is  to  escape  from  a  local  optimal  configuration. 
If  there  are  no  upward  or  downw’ard  moves  over  a  number  of  iterations,  the  thermal  equilibrium 
is  reached.  The  temperature  then  is  reduced  to  a  smaller  value  and  the  searching  continues  from 
the  current  solution  point.  The  whole  process  terminates  when  either  (1)  the  lowest  energy  point 

is  found  or  (2)  no  upward  or  downward  jumps  have  been  tatken  for  a  number  of  successive  thermal 
equilibrium. 

The  structure  of  simulated  annealing  (SA)  algorithm  is  shown  in  Figure  7.  The  first  step  of 


the  algorithm  is  to  randomly  choose  an  assignment  (p,  a  total  ordering  of  instances  within  one 
scheduling  frame,  a^,  and  a  total  ordering  of  communications  for  the  instances,  Cc-  A  solution 
point  in  the  search  space  of  SA  is  a  3-tuple  The  energy  of  a  solution  point  is  computed  by 

equation  (5).  For  each  solution  point  P  which  is  infeasible,  (i.e.  is  nonzero),  a  neighbor  finding 
strategy  is  invoked  to  generate  a  neighbor  of  P.  As  stated  before,  if  the  energy  of  the  neighbor  is 
lower  than  the  current  value,  we  accept  the  neighbor  a^  the  current  solution;  otherwise,  a  probability 

function  (i.e.  exp{  '’j-  "))  is  evaluated  to  determine  whether  to  accept  the  neighbor  or  not.  The 
parameter  of  the  probability  function  is  the  current  temperature.  As  the  temperature  is  decreasing, 
the  chance  of  accepting  an  uphill  jump  (i.e.  a  solution  point  with  a  higher  energy  level)  is  smaller. 
The  inner  and  outer  loops  are  for  thermal  equilibrium  and  termination  respectively.  The  number  of 
iterations  for  the  inner  loop  is  also  a  function  of  current  temperature.  The  lower  the  temperature 
is,  the  bigger  the  number  is.  Methods  about  how  to  model  the  numbers  of  iterations  and  how 
to  assign  the  number  for  each  temperature  have  been  proposed  (LH91].  In  this  dissertation,  we 
consider  a  simple  incremental  function.  Namely,  A'  =  IV  -f  A  where  N  is  the  number  of  iterations 
and  A  is  a  constant.  The  termination  condition  for  the  outer  loop  is  Ep  =  0.  Whenever  thermal 
equilibrium  is  reached  at  a  temperature,  the  temperature  is  decrea.sed.  Linear  or  nonlinear  approach 
of  temperature  decrease  function  can  be  simple  or  complex.  Here  we  consider  a  simple  multiplication 
function  (i.e.  T  =  T  x  a,  where  a  <  1). 


4.1  Evaluation  of  Energy  Value  for  a  Solution  Point  (o,  am,  a^) 

The  computation  of  the  energy  value  stated  in  Equation  5  ,  is  done  by  constructing  multi-processor 
schedules  and  a  network  schedule,  and  collecting  the  the  start  and  completion  times  of  each  task 
instance  and  communication  from  these  schedules. 

The  construction  of  the  schedules  is  characterized  by  the  priority  assignment  of  the  task  in¬ 
stances  in  the  set.  The  priority  assignment  algorithm  determines  the  scheduling  order  among  all 
the  task  instances.  Each  time  when  a  task  instance  is  chosen  to  be  scheduled,  the  inconoing  com¬ 
munications  of  the  instance  are  scheduled  first  and  then  the  task  instance  itself.  After  all  the 
task  instances  have  been  scheduled,  the  scheduling  of  the  outgoing  communications  is  performed. 
An  algorithmic  description  about  how  to  compute  the  energy  value  for  a  solution  point  is  given 
in  Figure  6.  Note  that  a  communication  is  an  incoming  communication  to  a  task  instance  if  the 
frequency  of  the  receiving  task  instance  is  equal  to  or  less  than  that  of  the  sending  task  instance. 

For  example,  rl  and  r/  are  incoming  communications  to  vf .  On  the  other  hand,  if 

the  sender  frequency  is  less  than  the  receiver  frequency,  then  the  communication  is  an  outgoing 
communication,  (e.g.  rj  is  the  outgoing  communication  of  t-^). 


4.1.1  Priority  Assignment  of  Task  Instances: 

In  the  work  [CA93],  we  presented  the  SLsF  algorithm  and  the  performance  evaluation.  The  re¬ 
sults  showed  that  SLsF  outperforms  SPF  and  SJF.  In  this  paper  we  use  the  SLsF  as  the  priority 
assignment  algorithm  for  the  task  instances  in  I. 

Formally,  if  2st(r/)  <  /sf(Tjf),  then  cr„^(r/)  <  a^(r|).  And  the  insertion  of  a  time  slot  for 

rf  precedes  that  for  ri  if  am{T{)  <  am{4)-  The  time-based  scheduling  algorithm  for  a  task 
instance  is  used  to  And  a  time  slot  for  a  task  instance  once  the  effective  start  time  is  given.  We 
define  the  effective  start  time  of  a  task  instance  as  the  earliest  start  time  when  the  incoming 
communications  are  taken  into  account.  Let  t  be  the  maximum  completion  time  among  all  the 
incoming  communications  of  a  task  instance,  then  the  effective  start  time  of  the  task  instance  is  set 
to  the  bigger  value  among  t  and  est  (as  stated  in  Equation  6). 


4.1.2  Scheduling  the  Incoming  Communications:ac 

There  are  two  kinds  of  incoming  communications.  The  first  kind  is  called  the  synchronous  com¬ 
munication  in  which  the  frequencies  of  the  sender  and  receiver  are  identical.  The  other  kind  is 
called  the  asynchronous  communication  in  which  the  sending  task  instance  is  associated  with  a 
question  mark.  For  such  an  asynchronous  communication,  we  have  to  decide  which  instance  of  the 
sending  task  should  communicate  with  the  receiving  task  instance.  The  approach  we  take  is  to  find 
the  nearest  instance  of  the  sending  task.  The  reason  is  that,  by  finding  the  nearest  instance,  the 
time  difference  between  start  time  of  the  receiving  instance  and  the  completion  time  of  the  sending 
instance  is  the  smallest.  The  chance  of  violating  the  latency  constraint  of  a  communication  will  be 
the  smallest  then. 

The  nearest  instance  of  a  sending  task  can  be  found  using  the  foUowing  method.  Given  an 
incoming  communication  and  the  effective  start  time  of  eft  we  search  through  the 

linked  list  of  processor  ®(t*)  up  to  time  eft.  K  there  is  some  instance  oijkt  ss^y  whose  completion 
time  is  the  latest  among  all  scheduled  instances  of  then  the  nearest  instance  is  found.  Otherwise, 
we  continue  to  search  through  the  linked  list  until  an  instance  of  is  found.  We  set  the  effective 
start  time  of  the  communication  to  be  the  completion  time  of  the  found  instance.  We  also  erase 
the  question  mark  such  that  rf  is  changed  to  rf  rf.  For  the  synchronous  communication, 
the  effective  start  time  of  the  communication  is  simply  assigned  as  the  finish  time  of  the  sending 
task  instance. 

The  scheduling  of  the  communication  is  done  by  inserting  a  time  slot  to  the  linked  list  for  the 
communications  network.  The  start  time  of  the  time  slot  can  not  be  earlier  than  the  effective  start 


time  of  the  conununica-tion.  Once  the  time  slot  is  inserted,  we  check  the  effective  start  time  of  rj 
to  make  sure  that  it  is  not  less  than  the  finish  time  of  the  time  slot.  If  it  is,  the  effective  start  time 
of  r/  is  updated  to  be  the  finish  time  of  the  time  slot. 

If  a  task  instance  has  more  than  one  incoming  communication,  the  sched-uling  order  among  these 
communications  is  based  on  their  latency  constraints.  The  bigger  the  latency  value  is,  the  earlier 
the  communication  is  scheduled.  The  incoming  communication  with  the  tightest  latency  constraint 
is  scheduled  last.  It  is  because  the  effective  start  time  of  the  receiving  task  instance  is  constantly 
updated  by  the  scheduling  of  the  incoming  communications.  It  is  possible  that  the  scheduling  of 
the  later  incoming  communications  increases  the  effective  start  time  of  the  receiving  task  instance 
and  make  the  early  scheduled  communication  violate  its  latency  constraint  if  the  constraint  is  tight. 

4.1.3  Scheduling  the  Outgoing  Communications:  ctj 

The  scheduling  of  the  outgoing  communications  for  the  whole  task  set  is  performed  after  all  the 
task  instances  have  been  scheduled.  The  scheduling  order  among  these  communications  is  based 
on  the  finish  times  of  the  sending  task  instances.  The  task  instance  with  the  smallest  finish  time  is 
considered  first.  When  a  task  instance  is  taken  into  account,  all  its  outgoing  communications  are 
scheduled  one  by  one  according  to  their  latency  constraints.  The  communication  with  the  tightest 
latency  constraint  is  scheduled  first. 

Given  an  outgoing  communication  rf  and  the  finish  time  of  rf ,  //,  the  effective  start 

time  of  the  communication  is  set  to  be  //.  Based  on  the  effective  start  time,  a  time  slot  in  inserted 
for  this  communication.  Then  the  nearest  instance  of  receiving  task  can  be  found  based  on  the 
finish  time  of  the  time  slot. 

For  the  example  shown  in  Figure  5,  The  incoming  communication  marked  with  “(1)”  is  scheduled 
before  the  scheduling  of  The  sixth  instance  of  t~  is  chosen  as  the  nearest  instance.  As  for  the 
outgoing  communication  marked  with  “(3)”,  it  is  scheduled  after  the  scheduling  of  rj,  rj,  rj,  and 
r®.  In  this  example,  t®  is  the  nearest  instance  of  the  outgoing  communication. 


4.2  Neighbor  Finding  Strategy:  © 

The  neighbor  finding  strategy  is  used  to  find  the  next  solution  point  once  the  current  solution  point 
is  evaluated  as  infeasible  (i.e.  energy  value  is  nonnegative).  The  neighbor  space  of  a  solution  point 
is  the  set  of  points  which  can  be  reached  by  changing  the  assignment  of  one  or  two  tasks.  There 
are  several  modes  of  neighbor  finding  strategy. 


•  Balance  Mode:  We  randomly  move  a  task  from  the  heavily-loaded  processor  to  the  lightest- 
loaded  processor.  This  move  tries  to  balance  the  workload  of  processors.  By  balancing  the 
workload,  the  chance  to  find  a  neighbor  with  a  lower  energy  value  is  bigger. 

•  Swap  Mode:  We  randomly  choose  two  tasks  r,-  and  Tj  on  processors  p  and  q  respectively. 
Then  we  change  <f>  by  setting  ^(r,)  =  q  and  <f>{rj)  =  p. 

•  Merge  Mode:  We  pick  two  tasks  and  move  them  to  one  processor.  By  merging  two  tasks  to 
a  processor,  we  increase  the  workload  of  the  processor.  There  is  an  opportunity  of  increasing 
the  energy  level  of  the  new  point  by  increasing  the  workload  of  the  processor.  The  purpose  of 
the  move  is  to  perturb  the  system  and  allow  the  next  move  to  escape  from  the  local  optimum. 

•  Direct  Mode:  When  the  system  is  in  a  low-energy  state,  only  few  tasks  violate  the  jitter 
or  latency  constraints.  Under  such  a  circumstance,  it  will  be  more  beneficial  to  change  the 
assignment  of  these  tasks  instead  of  randomly  moving  other  tasks.  From  the  conducted  ex¬ 
periments,  we  find  that  this  mode  can  accelerate  the  searching  of  a  feasible  solution  especially 
when  the  system  is  about  to  reach  the  equilibrium. 

The  selection  of  the  appropriate  mode  to  find  a  neighbor  is  based  on  the  current  system  state. 
Given  a  randomly  generated  initial  state  (i.e.  solution  point),  the  workload  discrepancy  between 
the  processors  may  be  huge.  Hence,  in  the  early  stage  of  the  simulated  annealing,  the  balance 
mode  is  useful  to  balance  the  workload.  -4fter  the  processor  workload  is  balanced  out,  the  swap 
mode  and  the  merge  mode  are  frequently  used  to  find  a  lower  energy  state  until  the  system  reaches 
near- termination  state.  In  the  final  stage  of  the  annealing,  the  direct  mode  tries  to  find  a  feasible 
solution.  The  whole  process  ternoinates  when  a  feasible  solution  is  found  in  which  the  energy  value 
is  zero. 

5  Experimental  Results 

We  implemented  the  algorithm  as  the  framework  of  the  allocator  on  MARUTI{GMK'^91,  MSA92, 
SdSA94],  a  real-time  operating  system  developed  at  the  University  of  Maryland,  and  conducted 
extensive  experiments  under  various  task  characteristics.  The  tests  involve  the  allocation  of  real¬ 
time  tasks  on  a  homogeneous  distributed  system  connected  by  a  communication  channel. 

To  test  the  practicality  of  the  approach  and  show  the  significance  of  the  algorithm,  we  consider  a 
simplified  and  sanitized  version  of  a  real  problem.  This  was  derived  from  actual  development  work, 
and  is  therefore  representative  of  the  scheduling  requirements  of  an  actual  avionics  system.  The 
Boeing  777  Aircraft  Information  Management  System  (AIMS)  is  to  be  running  on  a  multiprocessor 


10_Proc 

9JProc 

8-Proc 

7_Proc 

6-Proc 

Exec_Time  (Sec) 

2369 

5572 

19774 

36218 

78647 

=  Hr  ;  Min  :  Sec 

0:39:29 

1:32:52 

5:29:34 

10:03:38 

21:50:47 

Table  1:  The  execution  times  of  the  AIMS  with  different  number  of  processors 

system  connected  by  a  SafeBus  (TM)  ultra-reliable  bus.  The  problem  is  to  find  the  minimum 
number  of  processors  needed  to  assign  the  tasks  to  these  processors.  The  objective  is  to  develop 
an  off-line  non-preemptable  schedule  for  each  processor  and  one  schedule  for  the  SafeBus  (TM) 
ultra-reliable  bus. 

The  AIMS  consists  of  155  tasks  and  951  communications  between  these  tasks.  The  frequencies 
of  the  tasks  vary  from  5HZ  to  40HZ.  The  execution  times  of  the  tasks  vary  from  0ms  to  16.650ms. 
The  NEI  and  XEI  of  a  task  t,-  are  p,-  —  500/is  and  pi  -f  500ps  respectively.  Since  6  =  lOOOps  =  1ms 
<  ,  the  smallest-period-first  scheduling  algorithm  can  be  used  in  this  case.  Tasks  communicate 

with  others  asynchronously  and  in  mutuality.  The  transmission  times  for  communications  are  in  the 
range  from  Ops  to  447.733ps.  The  latency  constraints  of  the  communications  vaxy  from  68.993ms 
to  200ms.  The  LCM  of  these  155  tasks  is  200ms.  When  the  whole  system  is  extended,  the  total 
number  of  task  instances  within  one  scheduling  frame  is  624  and  the  number  of  communications  is 
1580. 

For  such  a  real  and  tremendous  problem  size,  pre-analysis  is  necessary.  We  calculate  the  resource 
utilization  index  to  estimate  the  minimum  number  of  processors  needed  to  run  AIMS.  The  index 
is  defined  as 

X  qj) 

LCM 

where  e,-  is  the  execution  of  task  t,-  and  g,-  =  The  obtained  index  for  AIMS  is  5.14.  It  means 

there  exist  no  feasible  solutions  for  the  AIMS  if  the  number  of  processors  in  the  multiprocessor 
system  is  less  than  6. 

The  number  of  processors  which  the  AIMS  is  allowed  to  run  on  is  a  parameter  to  the  scheduling 
problem.  We  start  the  AIMS  scheduling  problem  with  10  processors.  After  a  feasible  solution  is 
found,  we  decrease  the  number  of  processors  by  one  and  solve  the  whole  problem  again.  We  run 
the  algorithm  on  a  DECstation  5000.  The  execution  time  for  the  AIMS  scheduling  problem  with 
different  numbers  of  processors  is  summarized  in  Table  1.  The  algorithm  is  able  to  find  a  feasible 
solution  of  the  AIMS  with  six  processors  which  is  the  minimum  number  of  processors  according 
to  the  resource  utilization  index.  The  time  to  find  such  a  feasible  solution  is  less  than  one  day 
(approximately  22  hours). 


5.1  Discussions 


For  feasible  solutions  of  the  AIMS  with  various  numbers  of  processors,  we  calculate  the  processor 

utilization  ratio  (PUR)  of  each  processor.  The  processor  utilization  ratio  for  a  processor  p  is  dehned 
as 

X  g,) 

LCM 

The  results  are  shown  in  Figure  8.  The  ratios  are  sorted  into  a  non-decreasing  order  given  a  fixed 
number  of  processors.  The  algorithm  generates  the  feasible  solutions  for  the  AIMS  with  6,  7,  8,  9 
and  10  processors  respectively.  For  example,  for  the  6-processor  case,  the  PURs  for  the  heaviest- 
loaded  and  lightest-loaded  processors  are  0.91  and  0.76  respectively.  For  the  10-processor  cases,  the 
PURs  are  0.63  and  0.28  respectively.  We  find  that  the  ratio  difference  between  the  heaviest-loaded 
processor  and  the  lightest-loaded  processor  in  the  6-processor  case  is  smaller  than  those  in  other 
cases.  It  means  the  chance  for  a  more  load-balanced  allocation  to  find  a  feasible  solution  is  bigger 
when  the  number  of  processors  is  smaller. 

The  detailed  schedules  for  the  6-processor  case  are  shown  in  Figure  9.  The  results  are  shown 
on  an  interactive  graphical  interface  which  is  developed  for  the  design  of  MARUTl.  The  time  scale 
shown  in  Figure  9  is  100/zs.  So  the  LCM  is  shown  as  2000  in  the  figure,  (i.e.  2000  x  lOO^s  = 
200ms.)  This  solution  consists  of  seven  off-line  non-preemptive  schedules:  one  for  each  processor 
and  one  for  the  SafeBus  (TM).  Each  of  these  schedules  will  be  one  LCM  long  where  an.  infinite 
schedule  can  be  produced  by  repeating  these  schedules  indefinitely.  Note  that  the  pseudo  instances 
are  introduced  to  make  sure  the  wrapping  around  at  the  end  of  the  LCM-long  schedules  should 
satisfy  the  latency  and  next-execution-interval  requirements  across  the  point  of  wrap-around.  The 
pseudo  instances  are  not  shown  in  Figure  9. 

The  inclusion  of  resource  and  memory  constraints  into  the  problem  can  be  done  by  modifying 
neighbor-finding  strategy.  Once  a  neighbor  of  the  current  point  is  generated,  it  is  checked  to 
ascertain  that  the  constraints  on  memory  etc.  are  met.  If  not,  the  neighbor  is  discarded  and 
another  neighbor  is  evaluated. 


References 

[CA93]  Sheng-Tzong  Cheng  and  .4shok  K.  Agrawala.  Scheduling  of  periodic  tasks  with  relative 
timing  constraints.  Technical  Report  CS-TR-3392,  UMIACS-TR-94-135,  Department  of 
Computer  Science,  University  of  Maryland,  College  Park,  December.  1993.  Submitted 
to  the  10th  Annual  IEEE  Conference  on  Computer  Assurance,  COMPASS  ’95. 


[CDHC94]  T.  Carpenter,  K.  Driscoll,  K.  Hoyme,  and  J.  Carciofini.  Arinc  659  scheduling:  Problem 
definition.  In  Proceedings  of  IEEE  Real-Time  Systems  Symposium,  San  Juan,  PR,  Dec. 
1994. 

[GMK+91]  6.  Gudmundsson,  D.  Mosse,  K.T.  Ko,  A.K.  Agrawala,  and  S.K.  Tripathi.  Maruti:  A 
platform  for  hard  real-time  applications.  In  K.  Gordon,  A.K.  Agrawala,  and  P.  Hwang 
(eds.),  editors,  Mission  Critical  Operating  Systems.  lOS  Press,  1991. 

[HS92]  Chao-Ju  Hou  and  Kang  G.  Shin.  Allocation  of  periodic  task  modules  with  precedence 
and  deadline  constrednts  in  distributed  real-time  systems.  In  Proceedings  of  the  1992 
IEEE  13th  Real-Time  Systems  Symposium,  pages  146-155,  Phoenix,  AZ,  1992. 

[KGV83]  S.  Kirkpatrick,  C.  D.  Gelatt,  and  M.  P.  Vecchi.  Optimization  by  simulated  annealing. 
Science,  220(4598):671-680,  May  1983. 

[LH91]  Feng-Tse  Lin  and  Ching-Chi  Hsu.  Ta.sk  assignment  problems  in  distributed  comput- 
ing  systems  by  simulated  annealing.  Journal  of  the  Chinese  Institute  of  Engineers, 
14(5):537-550,  Sept.  1991. 

Daniel  Mosse,  M.C.  Saksena,  and  Ashok  K.  Agrawala.  Maruti:  An  approach  to  real¬ 
time  system  design.  Technical  Report  CS-TR-2845,  UMIACS-TR-92-21,  Department 
of  Computer  Science,  University  of  Maryland,  College  Park,  1992. 

Krithi  Ramamritham.  ARocation  and  scheduling  of  complex  periodic  ta^ks.  In  Pro¬ 
ceedings  of  the  1 0th  International  Conference  on  Distributed  Computing  Systems,  pages 
108-115,  Paris,  France,  1990. 

[SdSA94]  M.  Saksena,  J.  da  Silva,  and  A.  K.  Agrawala.  Design  and  implementation  of  maruti- 
ii.  Technical  Report  CS-TR-2845,  Department  of  Computer  Science,  University  of 
Maryland,  College  Park,  1994. 

[TB\V92]  K.  Tindell,  A.  Burns,  and  A.  J.  Vvellings.  Allocating  hard  real-time  tasks:  an 
NP-hard  problem  made  easy.  Real-Time  Systems,  4(2):145-165,  June  1992. 


[MSA92] 


[Ram90] 


Given  a  solution  point  P  =  {4>,am,Oc) 

While  there  is  some  unscheduled  task  instance  do 

Find  the  next  unscheduled  instance.  /*  By  the  SLsF  algorithm  * / 

Let  the  instance  be  rj. 

Sort  all  the  incoming  communications  of  r/  based  on 
the  latency  values  into  a  descending  order. 

Schedule  each  incoming  communication  starting  from 

the  biggest-latency  one  to  the  tightest-latency  one. 

Schedule  the  instance  r/. 

End  While. 

Mark  each  instance  as  un-examined. 

While  there  is  some  un-examined  task  instance  do 

Find  the  next  un-examined  task  instance.  /*  By  the  finish  times  * / 
Sort  all  the  outgoing  communications  of  the  task  instance  based 
on  the  latencj'  values  into  an  increasing  order. 

Schedule  each  outgoing  communication  starting  from 

the  tightest-latency  one  to  the  biggest-latency  one. 

Mark  the  task  instance  examined. 

End  While. 

Collect  the  start  time  and  imish  time  informations  for  each  task  instance  and 
Compute  the  energy  value  using  Equation  5. 


Figure  6:  The  pseudo  code  for  computing  the  energ}’  value 


communi  cati  on . 


Choose  an  initial  temperature  T 
Choose  randomly  a  starting  point  P  = 
Ep  :=  Energy  of  solution  point  P 


if  Ep  =  0  then 


output  Ep  and  exit  /*  Ep  0  means  a  feasible  solution  */ 


end  if 
repeat 

repeat 

Choose  N,  a  neighbor  of  P 
En  :=  Energy  of  solution  point  N 
if  £n  =  0  then 

output  En  and  exit  /*  En  =  0  means  a  feasible  solution  */ 

end  if 

if  En  <  Ep  then 
P  --N 

Ep  :=  En 

else 

2- 

X  .  j 

if  e"  >  random(0.1)  then 
P  :=  N 

Ep  :=  En 

end  if 

end  if 

until  thermal  equilibrium  at  T 
T  :=  a  X  T  (where  a  <  1) 
until  stopping  criterion 


Figure  7;  The  structure  of  simulated  annealing  algorithm 
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Figure  9:  The  Allocation  Results  and  Schedules  for  AIMS  with  6 
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Abstract 

The  problem  of  non-preemptive  scheduling  of  a  set  of  periodic  tasks  on  a  single  processor 
has  been  traditionally  considering  the  ready  time  and  deadline  on  each  task.  As  a  consequence, 
a  feasible  schedule  finds  that  in  each  period  one  instance  of  each  task  starts  the  execution  after 
the  ready  time  and  completes  the  execution  before  the  deadline  . 

Recently,  the  timing  requirements  of  the  real-time  systems  emerge  that  the  relative  timing 
constraints  are  imposed  on  the  consecutive  executions  of  each  task.  In  this  paper,  we  consider 
the  scheduling  problem  of  the  periodic  tasks  with  the  relative  timing  constraints  imposed  on  two 
consecutive  executions  of  a  task.  We  analyze  the  timing  constraints  and  derive  the  scheduling 
window  for  each  task  instance.  Based  on  the  scheduling  window,  we  present  the  time-based 
approach  of  scheduling  a  task  instance.  The  task  instances  are  scheduled  one  by  one  based  on 
their  priorities  assigned  by  the  proposed  algorithms  in  this  paper.  We  conduct  the  experiments 
to  compare  the  schedulability  of  the  algorithms. 
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interpretea  as  representing  the  official  policies,  either  expressed  or  implied,  of  Honevweli  or  Armv/Phiiiips 


1  Introduction 


The  task  scheduling  problem  is  one  of  the  basic  issues  of  building  real-time  applications  in  which  the 
tasks  of  applications  are  associated  with  timing  constraints.  For  the  hard  real-time  applications, 
such  as  avionics  systems  and  nuclear  power  systems,  the  approach  to  guarantee  the  critical  timing 
constraints  is  to  schedule  periodic  tasks  a  priori.  A  non-preemptive  schedule  for  a  set  of  periodic 
tasks  is  generated  by  assigning  a  start  time  to  each  execution  of  a  task  to  meet  their  timing 
constraints.  Failure  to  meet  the  specified  timing  constraints  can  result  in  disastrous  consequence. 

Various  kinds  of  periodic  task  models  have  been  proposed  to  represent  the  real-time  system 
characteristics.  One  of  them  is  to  model  an  application  as  a  set  of  tasks,  in  which  each  task  is 
executed  once  every  period  under  the  ready  time  and  deadline  constraints.  These  constraints  impose 
constant  intervals  in  which  a  task  can  be  executed.  In  literature,  many  techniques  [2,  3,  4,  5,  6,  7,  8] 
have  been  proposed  to  solve  the  scheduling  problem  in  this  context.  The  deficiency  of  this  modeling 
is  the  inability  of  specifying  the  relative  constraints  across  task  periods.  For  example,  one  can  not 
specify  the  timing  relationship  between  two  consecutive  executions  of  the  same  task. 


Simply  assuring  that  one  instance  of  each  task  starts  the  execution  after  the  ready  time  and 
completes  the  execution  before  the  specified  deadline  is  not  enough.  Some  real-time  applications 
have  more  complicated  timing  constraints  for  the  tasks.  For  example,  the  relative  timing  constraints 
may  be  imposed  upon  the  consecutive  executions  of  a  task  in  which  the  scheduling  of  two  consecutive 
executions  of  a  periodic  task  must  be  separated  by  a  minimum  execution  interval.  The  Boeing  777 
Aircraft  Information  Management  System  is  such  an  example  [1].  One  possible  solution  to  the 
scheduling  problem  of  such  applications  is  to  consider  the  instances  of  tasks  rather  than  the  tasks. 
A  task  instance  is  defined  as  one  execution  of  a  task  within  a  period.  With  the  notion  of  task 
instances,  one  is  able  to  specify  the  various  timing  constraints  and  dependencies  among  instances 
of  tasks. 

In  this  paper,  we  consider  the  relative  timing  constraints  imposed  on  two  consecutive  instances 
of  a  task.  The  task  model  and  the  analysis  of  the  timing  constraints  are  introduced  in  Sections  2 
and  3  respectively.  Based  on  the  analysis,  we  are  able  to  derive  the  scheduling  window  for  each 
task  instance.  Given  the  scheduling  window  of  a  task  instance,  we  present  the  time-based  approach 
of  scheduling  a  task  instance  in  Section  4.  We  propose  three  priority  assignment  algorithms  for  the 
task  instances  in  Section  5.  The  task  instances  are  scheduled  one  by  one  based  on  their  priorities. 
In  Section  6,  we  evaluaie  the  three  algorithms  and  show  the  experimental  results. 


2  Problem  Statement 


Consider  a  set  of  periodic  tasks  T  =  {  r,-  |  z  =  1,  . . .  n  },  where  r.-  is  a  4-tuple  <  p,-,  e.-,  A,-,  77,-  > 
denoting  the  period,  computation  time,  low  jitter  and  high  jitter  respectively.  One  instance  of  a 
tcLsk  is  executed  each  period.  The  execution  of  a  task  instance  is  non-preemptable.  The  start  times 
of  two  consecutive  instances  of  task  r,-  are  at  least  p,-  —  A^  and  at  most  p,-  -f-  77, ■  apart. 

In  order  to  schedule  periodic  tasks,  we  consider  the  least  common  multiple  (LCM)  of  all  periods 
of  tasks.  Let  tz,-  be  the  number  of  instances  for  task  r,-  within  a  schedule  of  length  LCM.  Hence,  tz,- 
=  .  A  schedule  for  a  set  of  tasks  is  the  mapping  of  each  task  r,-  to  tz,-  task  instances  and  the 

assigning  of  a  start  time  to  the  j-th  instance  of  task  r,-,  r/,  V  z  =  1,  . . .  rz  and  y  =  1,  . . .,  tz,-.  A 
feasible  schedule  is  a  schedule  in  which  the  following  conditions  are  satisfied  for  each  task  r,-: 


n  = 

5,-  €i 

(1) 

s]  +  LCM 

(2) 

IV 

+  Pi  -  ^i 

(3) 

VI 

+  Pi  +  Vi 

(4) 

Vy  =  2,...,tz,-  -1-  1. 

The  non- preemption  scheduling  discipline  leads  to  Equation  1  w'here  //  is  the  finish  time  of  7^  . 
Another  condition  for  non-preemption  scheduling  is  that  given  any  z,  j,  k  and  £,  if  <  s{  then  // 
<  s^.  It  means  the  schedule  for  any  two  instances  is  non-overlapping.  The  constructed  schedule  of 
length  LCM  is  invoked  repeatedly  by  wrapping-around  the  end  point  of  the  first  schedule  to  the 
start  point  of  the  next  one.  Hence,  as  shown  in  Equation  2,  the  start  time  of  the  first  instance  in 
the  next  schedule  is  exactly  one  LCM  away  from  that  of  the  first  schedule.  Finally,  Equations  3 
and  4  specify  the  relative  timing  constraints  between  two  consecutive  instances  of  a  task. 

3  Analysis  of  Relative  Timing  Constraints 

Define  the  scheduling  window  for  a  task  instance  as  the  time  interval  during  which  the  task  can 
start.  Traditionally,  the  lower  and  upper  bounds  of  the  scheduling  window  for  a  task  instance  are 
called  earliest  start  time  (est)  and  latest  start  time  (1st)  respectively.  These  values  are  given  and 
independent  of  the  start  times  of  the  preceding  instances. 


Instance  ID 

est  =  si  '  -f  Pi  -  A,' 

Isi  =  si  ^  +  Pi  +  Pi 

actual  start  time  (si) 

0 

4 

r  rf 

39 

49 

40 

75 

85 

77 

I  rf 

112 

I  122 

113 

148 

158 

* 

Table  1:  An  example  to  show  the  wrong  setting  of  scheduling  windows 


We  consider  the  scheduling  of  periodic  tasks  with  relative  timing  constraints  described  in  Equa¬ 
tions  3  and  4.  The  scheduling  window  for  a  task  instance  is  derived  from  the  start  times  of  its 
preceding  instances.  A  feasible  scheduling  window  for  a  task  instance  t-  is  a  scheduling  window 
in  which  any  start  time  in  the  window  makes  the  timing  relation  between  and  satisfy 
Equations  3  and  4.  Formally,  given  s},  s?,  . . and  . . the  problem  is  to  derive  the  feasible 

scheduling  window  for  rf  such  that  a  feasible  schedule  can  be  obtained  if  vf  is  scheduled  within 
the  window. 

For  the  sake  of  simplicity,  we  assume  that  r,-  =  0  and  d.-  =  p,-,  V  i,  in  this  section.  Then,  simply 
assigning  esi  and  1st  of  r/  as  ^  -j-  p,-  —  A,-  and  +  pi  -f  p,-  respectively  where  z  =  1,  2,  . . .,  n 
j  ~  2,  . . .,  71,',  is  not  tight  enough  to  guarantee  a  feasible  solution.  For  example,  consider 

the  case  shown  in  Table  1  in  which  a  periodic  task  r,-  is  to  be  scheduled.  Let  LCM,  p,-.  A,-,  and  t?,- 
be  200,  40,  5,  and  5  respectively.  Hence,  there  are  5  instances  within  one  LCM  (i.e.  m  =  5).  The 
first  column  in  Table  1  indicates  the  instance  IDs.  The  second  and  third  columns  give  the  est  and 
Ist  of  the  scheduling  windows  for  the  task  instances  specified  in  the  first  column.  The  last  column 
shows  the  actual  start  times  scheduled  for  the  particular  task  instances.  The  actual  start  time  is 
a  'value  in  between  esi  and  1st  of  each  task  instance.  For  instance,  the  est  and  1st  of  are  39  and 
49  respectively.  It  means  39  <  s?  <  49.  The  scheduled  value  for  s],  in  the  example,  is  40.  Since 
~  +  LCM  =  204,  we  find  that  any  value  in  the  interval  [148,158]  can  not  satisfy  the  relative 

timing  constraints  between  rf'  and  rf .  As  a  consequence,  the  constructed  schedule  is  infeasible. 

We  draw  a  picture  to  depict  the  relations  among  the  start  times  of  task  instances  in  Figure  1. 
When  i,-  is  taken  into  account,  the  scheduling  window  for  s^-  is  obtained  by  considering  its  relation 
with  as  well  as  that  with  s'^'  and  We  make  sure  that  once  s{  is  determined,  the  estimated 

esi  and  1st  of  s,  ’,  based  on  s{  and  s”' '  specify  a  feasible  scheduling  window  for  i”’.  Namely,  the 
interval- which  is  specified  by  the  estimated  est  and  Isi  of  sf',  based  on  sj,  overlaps  the  interval 


Figure  1:  The  relations  between  the  task  instances 

Proposition  1:  Let  the  est  and  1st  of  r/  be 

est{ri)  =  max{(sp^  +  p,-  -  A.-),  (4  +  (j  -  1)  x  p,-  -  (n,-  -  j  +  1)  x  77^)},  (5) 

and  Isti-ri)  =  +  p,-  +  p.),  (5!  +  {j  -  i)  x  p,-  +  (n,  -  j  +  1)  X  A.-)}.  (6) 

If  4  is  in  between  the  est(7f )  and  lst{rf),  then  the  estimated  est  and  Isi  of  sf’  ,  based  on  4  and 
specify  a  feasible  window. 

Proof:  Let  £  and  fi  be  the  estimated  est  and  1st  of  s^',  based  on  s^,  respectively. 

Hence, 

£  =  4  +  {m  -  j)  X  {pi  -  A.)  (7) 

p  =  4  +  (n.-  -  j)  X  (pi  -i-  Pi)  (8) 

To  guarantee  the  existence  of  feasible  start  time  of  Tp' ,  the  interval  [£,p]  has  to  overlap  the 
interval  [s,-  —  (p,-  +  p,-),  s^'  —  (pi  —  A,-)].  Hence  the  following  conditions  have  to  be  satisfied: 


(9) 


-  /X  <  P,-  +  Vi 


(10) 


By  replacing  £  in  Equation  9  with  +  (n;  -  j)  x  (p,-  -  A,),  we  obtain 

-Si  <  -s”'"’'^  -  (ni  -  j  +  1)  X  (pi  -  Ai) 

=  +  LCM  —  (ui  —  j  +  1)  X  (pi  —  Ai) 

—  "j"  Tij  X  Pi  (rii  j  "I"  1)  X  (pi  Ai) 

=  -Si  +  U  -  1)  X  Pi  +  (ni  -  j  +  1)  X  Ai  (11) 

Likewise,  by  replacing  fi  in  Equation  10  with  +  (ni  -  j)  x  (pi  +  Vi),  we  have 

•s-  >  -  (ni  -  i  +  1)  X  (pi  +  Vi) 

=  s}  +  LCM  -  (ni  -  i  +  1)  X  (pi  •+  Vi) 

=  -Si  +  (i  -  1)  X  Pi  -  (ni  -  J -f  1)  X  Pi  (12) 

So,  According  to  Equations  12  and  3,  we  choose  the  bigger  \^ue  between  (s^~^  +  pi  -  Ai)  and 

(^»  +  O'  ~  1)  Pi  ~  (^i  “  i  +  1)  X  Vi)  a-s  the  est  of  rf.  Similarly,  according  to  Equations  11  and  4, 

we  assign  the  smaller  value  of  (sf~^  +  p,-  +  Vi)  and  (sj  +  (j  -  1)  x  pi  +  (ni  -  j  +  1)  x  Ai)  as  the 
1st, 


□ 

Example  3.1:  To  show  how  Proposition  3  gives  a  tighter  bound  to  find  feasible  scheduling  windows, 
we  consider  the  case  shown  in  Table  1  again.  We  apply  Equations  5  and  6  to  compute  the  est  and 
Isi  of  each  instance.  The  results  are  shown  in  Table  2,  Note  that  the  scheduling  windows  for 
and  T?  are  tighter  than  those  in  Table  1.  As  a  consequence,  any  start  time  in  the  interval  [159,160] 
for  r/  satisfys  the  relative  timing  constraints  between  rf  and  rf . 

3.1  Property  of  Scheduling  Windows 

Define  P{(z^y^z)  as  the  predicate  in  which  the  estimated  est  and  1st  of  rf,  based  on  sf  and 
specify  a  feasible  scheduling  window  for  rf .  In  Proposition  3  ,  we  prove  that  for  any  sj  in  between 
est[T^ )  Isi^T^ )  as  specifiea  in  Equations  5  and  6,  -r  1)  is  true. 


1  Instance  ID 

est  from  Equation  5 

1st  from  Equation  6 

actual  start  time  (5^) 

'  i 

0 

40 

4 

39 

49 

40 

75 

85 

77 

1 

114 

122 

115 

159 

160 

159  ~  160 

Table  2;  The  correct  setting  of  scheduling  windows  based  on  Proposition  3.1. 


Lemma  1  Given  s],  s?,  . . .,  and  if,  V  fc  =  2,  . . .,  j,  estfr-^)  <  <  Ist^r^j  as  specified  in 

Equations  5  and  6,  then  Pi{j,  y,  n,-  +  1)  is  true,  V  j  =  j  +  1,  j  +  2,  ...,71,-. 

Proof:  We  prove  that  the  estimated  est  and  1st  of  rf,  based  on  s^  and  5”’'*’^,  specify  a  feasible 

scheduling  window,  by  showing  that  (1)  the  estimated  scheduling  window  of  sj,  based  on  s^,  is 
specified  by  the  interval 

[4  +  (y  - ;)  (Pi  -  ^0.  'Si  +  (y  -  j)  (pi  +  ^li)],  (i3) 

(2)  the  estimated  scheduling  window  of  sf,  based  on  is  specified  by  the  interval 

_  (n,-  _  y  +  1)  X  ipi  +  Pif  -  (n,  _  y  +  1)  X  ipi  -  A.-)],  (14) 

and  (3)  the  intervals  in  Equations  13  and  14  overlap. 

In  Figure  2,  we  see  that  the  necessary  and  sufficient  conditions  for  the  overlapping  of  the 
intervals  specified  in  Equations  13  and  14  are 

■Si  +  (y  -  j)  X  (pi  -  Ai)  <  -  (rii  -  y  +  1)  X  (pi  -  Ai)  (15) 

and  -  (ni  -  y  +  1)  X  (pi  +  J7i)  <  Si  +  (y  -  j)  x  (pi  +  pi).  (16) 

By  solving  the  Equations  15  and  16,  we  obtain 

4  <  £i  +  (j-l)xpi  +  (ni-i  +  l)x  Ai 

and  si  >  s]  +  {j  -  1)  x  pi  -  (ui  -  j  +  1)  x  pi. 

The  above  two  equations  describe  the  same  conditions  as  Equations  11  and  12  do.  Hence,  Pi(j,  y,  Ui  +  1) 
is  true,  V  y  =  j  —  1,  j  -j-  2,  . . .,  n,. 


4  +  (y  -  j)  X  (p,  -  Xi)  si  +  {y-  j)  X  {pi  +  Pi) 


(^i' -  y  +  1)  X  (p, -f  7,-)  -  (n;  -  J/ +  1)  X  (p,- -  A,) 


Figure  2:  The  overlapping  of  two  intervals 


□ 

Lemma  2  Given  s},  .. s^,  and  an  integer  uq,  where  1  <  no  <  j ,  if,  \/  k  =  2,  . . j,  est(rf) 
<  si  <  lst(T^ )  are  specified  as  in  Equations  5  and  6,  then  P{(j,y,  n;  +  no)  is  true,  V  y  =  ;  +  1, 

+  2,  .  .  Tlj. 

Proof:  We  use  the  same  method  in  Lemma  1  to  prove  it.  We  show  that  (1)  the  estimated  scheduling 
window  of  s^,  based  on  si,  is  specified  by  the  interval 

[5-  +  (y  -  i)  X  (pi  -  Xi),  si -i-(y-J)x  (pi  +  77,-)],  (17) 

(2)  the  estimated  scheduling  window  of  sf,  based  on  is  specified  by  the  interval 

-  (m  +  no-y)x  (pi  77;), -  (n,  -l  no  -  y)  x  (p,-  -  A,)],  (18) 

and  '(3)  these  two  intervals  overlap. 

The  following  conditions  have  to  be  satisfied  to  make  sure  the  overlapping  of  the  two  intervals. 

^i  <  ^7°  +  (j  -  1)  X  Pi  +  (m  -  J  +  1)  X  A,-  -  (p.-  -  A)  X  no  -  1  (19) 

and  >  'S?®  T- (i- 1)  xpv  -  (n,- -  i  +  1)  X  77i  -  (p,  +  77.)  X  no- 1.  (20) 

Since  s,-  <  s,-  -  (p,- A)  x  (no-  1)  and  sj  >  s”®  -  (p,-f-77,)  x  (no  -  1),  we  rewrite  Equations  19 
and  20 


■s •  <  -f  (j  -  1)  X  Pi  -f  (n,  -  ;  +  1)  X  A;-(p,  -  A)  x  np  -  1 


<  4  +  (j  -  1)  X  Pi  +  (n,  -  ;■  +  1)  X  \i 

and  s\  >  +  (j  -  1)  X  p,-  -  (n,-  -  ;  +  1)  x  ^-{pi  +  t?,-)  x  tiq  -  1. 

>  £[  +  (i  -  1)  X  Pi  -  (n,-  -  j  +  1)  X  pi 

Hence  Piij,y,  rii  +  no)  holds  for  any  1  <  <  j. 


□ 


Theorem  1  Given  and  Si,  if,  V  k  =  2,  j,  estfr*_)  <  5^  <  as  specified  in 

Equations  5  and  6,  then  Pi{j,  y,  z)  is  true,  V  p  =  j  +  1,  j  +  2,  . . n,-,  and  z  =  n, ■  +  1,  rii  +  2,  . . 
n, •  +  ;. 

By  combining  the  proofs  in  Lemmas  1  and  2,  it  is  easy  to  see  that  Theorem  1  holds.  Based  on 
Theorem  1  ,  we  can  assign  the  scheduling  window  for  r-  by  using  Equations  5  and  6  once  s},  sf, 


Before  we  present  the  scheduling  technique  for  a  taslc  instance,  let  us  consider  the  following 
objective.  The  objective  can  be  formulated  as  follows.  Given  a  set  of  tasks  with  the  characteristics 
described  in  Section  2,  we  schedule  the  task  instances  for  each  task  within  one  LCM  to  minimize 

5^=  12  -  P.)  (21) 

Subject  to  the  constraints  specified  in  Equations  1  through  4, 
where  q:(x)  =  2,  if2>0;  =  otherwise. 

Basically,  we  try  to  schedule  every  instance  of  a  task  one  period  apart  from  its  preceding 
instance.  An  optimal  schedule  is  a  feasible  schedule  with  the  minimum  total  deviation  value  from 
one  period  apart  for  instances. 


4  The  Time-Based  Scheduling  of  a  Task  Instance 

We  consider  the  time-based  solution  to  the  scheduling  problem  by  using  a  linked  list.  Each  element 
in  the  list  represents  a  time  slot  assigned  to  a  task  instance.  A  time  slot  w  has  the  following  fields; 
(1)  task  id  i  and  instance  idj  indicate  the  identifier  of  the  time  slot.  (2)  start  time  si  and  finish  time 

ft  indicate  the  start  time  and  completion  time  of  r?  respectively.  (3)  prev  ptr  and  next  pir  are  the 


Figure  3:  Insertion  of  a  new  time  slot 


pointers  to  the  preceding  and  succeeding  time  slots  respectively.  We  arrange  the  time  slots  in  the 
list  in  increasing  order  by  using  the  start  time  as  the  kej^  Any  two  time  slots  are  non-overlapping. 
Since  the  execution  of  an  instance  is  non-preemptable,  the  time  difference  between  start  time  and 
finish  time  equals  the  execution  time  of  the  task. 

4.1  Creating  a  Time  Slot  for  the  Task  Instance 

Consider  a  set  of  n  tasks.  Given  a  linked  list  and  a  task  instance  t/ ,  we  schedule  the  instance  by- 
inserting  a  time  slot  to  the  list.  According  to  equations  5  and  6,  we  compute  the  est{rf)  and  Ist^r^) 

first.  Let  S  be  the  set  of  unoccupied  time  intervals  that  overlap  the  interval  [est(r/),  /st(r/)]  in  the 
linked  list.  The  unoccupied  time  intervals  in  S  are  coUected  by  going  through  the  list.  Each  time 
when  a  pair  of  time  slots  (ui.tn  -F  1)  is  examined,  we  compute  £  =  max{est(7f  ),  fi{w)}  and  p  = 

imn{fst(7f ),  si{w  -1- 1)},  where  fi{w)  is  the  finish  time  of  the  time  slot  w,  and  st{w  -F 1)  is  the  start 
time  of  the  slot  next  to  tu.  If  ^  <  /i,  then  we  add  the  internal  [li,  /z]  to  S. 

The  free  intervals  in  5  are  the  potential  time  slots  which  rf  can  be  assigned  to.  Since  we  try 
to  schedule  rf  as  dose  to  one  period  away  from  the  preceding  instance  as  possible,  we  sort  5, 
based  on  the  function  of  the  lower  bound  of  each  interval,  0(4“'  p,-  -  £),  in  ascending  order. 

Without  loss  of  generality,  we  assume  that  S  after  the  sorting  is  denoted  by  {znt] ,  int2, ....  tnt|5|} 


The  idea  is  that  if  r/  is  scheduled  to  intk.  then  the  value  in  equation  21  will  be  smaller  than  that 
of  the  case  in  which  r/  is  scheduled  to  inik+i. 

The  scheduling  of  t-  can  be  described  as  follows.  Starting  from  inti,  we  check  whether  the 

length  of  the  interval  is  greater  or  equal  to  the  execution  time  of  rf  or  not.  If  yes,  then  we  schedule 
the  instance  to  the  interval.  One  new  time  slot  is  created  in  which  the  start  time  is  the  lower  bound 
of  the  interval  and  the  finish  time  equals  the  start  time  plus  the  execution  time.  The  created  time 
slot  is  added  to  the  linked  list  and  the  scheduling  is  done.  If  the  length  is  smaller  than  the  execution 
time,  then  we  check  the  length  of  the  next  interval  until  all  intervals  are  examined.  An  example  is 
shown  in  Figure  3  in  which  the  slot  with  dark  area  represents  r/.  In  this  example  we  assume  that 
cst^rj)  <  /i  and  S2  —  fi  >  e.  It  means  the  free  slot  between  the  first  and  second  occupied  slots 
can  be  assigned  to  rj . 


4.2  Sliding  of  the  Time  Slots 


In  case  none  of  the  intervals  in  S  can  accommodate  a  task  instance,  the  sliding  technique  is  used 
to  create  a  big  enough  interval  by  sliding  the  existence  time  slots  in  the  list. 

To  make  the  sliding  technique  work,  we  maintain  two  values  for  each  time  slot:  left  laxity  and 
right  laxity.  The  value  of  left  laxity  indicates  the  amount  of  time  units  by  which  a  time  slot  can  be 
left-shifted  to  a  earlier  start  time.  Similarly,  the  right  laxity  indicates  the  amount  of  time  units  bv 
which  a  time  slot  can  be  right-shifted  to  a  later  start  time. 

Given  the  time  slots  Wk.,  and  where  a  and  b  are  the  task  and  instance  identifiers  of 

Wk  respectively,  the  laxity  values  of  the  time  slot  Wk  can  be  computed  by: 


le  ftJaxity(wk) 
rightJaxiiy{wk) 
where 
and 


mzn{s^  -  esi',  -  fi{wk-i)  +  leftJaxity{wk-i)} 
Tnin{lsi'  -  5^,  st{wk+i)  -  ft -r  righiJaxity{wk+i)} 
est'  =  max{esi{Tt),  +  770)} 

1st'  =  min{lsi{Tt),  -  {p^  -  A„)}. 


(22) 

(23) 


Kote  that  the  interval  [est',  1st']  defines  the  sliding  range  during  which  can  start  without 
shifting  7-^“^  or  A  schematic  illustration  of  equations  22  and  23  is  given  in  Figure  4. 

From  equations  22  and  23,  we  see  that  the  computing  of  left-laxity(wk)  depends  on  that  of  Wk—i 
and  the  computing  of  right  jaxity[wk)  depends  on  that  of  ‘tvkJ-y  •  It  implies  a  two-pass  computation 


est'  1st' 


Figure  4:  An  iUustration  of  lefiJaxity^Wk)  rigktJ.axity{wk) 

is  needed  to  compute  the  laxity  values  for  all  time  slots.  The  complexity  is  0{2N)  where  N  is  the 
number  of  time  slots  in  the  linked  list. 

The  basic  idea  of  the  sliding  technique  is  described  as  follows.  Given  a  task  instance  rf  and  a 
set  of  unoccupied  intervals,  S  -  {tnii,  intj,  •  •  -  ,  we  check  one  interval  at  a  time  to  see  if 

the  interval  can  be  enlarged  by  shifting  the  existent  time  slots.  Two  possible  ways  of  enlargement 
are  (1)  by  either  shifting  the  time  slots,  that  precede  the  interval,  to  the  left  or  (2)  shifting  the 
slots,  that  follow  the  interval,  to  the  right.  The  shifting  depends  on  which  direction  minimizes  the 
objective  function  in  Equation  21. 

4.3  The  Algorithm 

An  algorithmic  description  about  how  to  schedule  a  task  instance,  as  described  in  Sections  4.1 
and  4.2,  is  given  in  Table  3. 

The  procedures  Left_Shift(tn;,,time.units)  and  Right_Shift(tn;t, time-units)  in  Table  3  may  involve 
the  shifting  of  more  than  one  time  slot  recursively.  For  example,  consider  the  case  in  Figure  4,  if 
Right.Shift(tD;„/st'  -  s^)  is  invoked  (i.e.  w/.  is  to  be  shifted  right  by  1st'  -  s^  time  units),  then 
has  to  be  sWfted  too.  It  is  because  the  gap  between  u;*  and  luk^-j  is  st(ix;fc^i)  -  which  is 


smaller  than  hi'  -  5^.  In  this  case,  Right_Shift(ti;;t+i,/st'  -  -  st(tnA:+i)  +  /^)  is  invoiced. 

We  do  not  enlarge  an  interval  at  both  ends.  Enlarging  an  interval  at  both  ends  needs  to  shift 
certain  amount  of  preceding  time  slots  to  the  left  and  shift  some  succeeding  slots  to  the  right.  It  is 
possible  that  some  task  instance  r|  is  shifted  left,  while  is  shifted  right.  As  a  consequence,  the 
timing  constraints  between  s%  and  could  be  violated.  For  example.  Let  s%  and  51+^  before  the 
shifting  be  10  and  20  respectively.  The  execution  time  for  is  5  time  units.  Assume  the  left  laxity 
of  is  5  and  the  right  laxity  of  is  5.  It  implies  -  si  <  15.  Consider  the  scheduling  of  a 
task  instance  r/  with  execution  time  15.  If  we  enlarge  the  interval  between  r|  and  rp^  by  shifting 
rl  left  5  time  units  and  right  5  time  units,  then  we  get  a  new  interval  with  15  time  units  for 
if.  However,  it  turns  out  that  =  25,  3%  =  5,  and  the  relative  timing  constraints  between  r| 
and  Tp^  is  violated. 

5  The  Priority-Based  Scheduling  of  a  Task  Set 

We  consider  the  priority-based  algorithms  for  scheduling  a  set  of  periodic  tasks  with  hybrid  timing 
constraints.  Given  a  set  of  periodic  tasks  T  =  {  r,'  |  t  =  1,  . . . ,  n  }  with  the  task  characteristics 
described  in  Section  2,  we  compute  the  LCM  of  all  periods.  Each  task  r,-  is  extended  to  Ui  task 
instances:  r} ,  rf,  ...,  r”’.  A  scheduling  algorithm  a  for  T  is  to  totally  order  the  instances  of  all 
tasks  within  the  LCM.  Namely,  c  :  iask.id  x  instance-id  integer. 

Three  algorithms  are  considered.  They  are  smallest  latest-start-time  first  (SLsF),  smallest  period 
first  (SPF),  and  smallest  jitter  first  (SJF)  algorithms. 

5.1  SLsF 

The  scheduling  window  for  a  task  instance  rj  depends  on  the  scheduling  of  its  preceding  instance. 

Once  s^-  ^  is  determined,  the  scheduling  window  of  the  instance  can  be  computed  by  equations  5 
and  6.  The  scheduling  window  for  the  first  instance  of  a  task  r,-  is  defined  as  [ri,d,-  -  e^j. 

The  idea  of  the  SLsF  algorithm  is  to  pick  one  candidate  instance  with  the  minimum  1st  among 
all  tasks  at  a  time.  One  counter  for  each  task  is  maintained  to  indicate  the  candidate  instance.  AO 
counters  are  initialized  to  1.  Each  time  when  a  task  instance  with  the  smallest  1st  is  chosen,  the 
algorithm  in  Table  3  is  invoked  to  schedule  the  instance.  After  the  scheduling  of  the  instance  is 
done,  the  counter  is  increased  by  one.  The  counter  for  r,-  overflows  when  it  reaches  n:  -I- 1.  it  means 


that  all  the  instances  of  r,-  are  scheduled.  The  algorithm  terminates  when  all  counters  overflow. 

"We  can  compute  the  relative  deadline  for  a  task  instance  by  adding  the  execution  time  to  the 
1st.  If  the  execution  times  for  all  tasks  are  identical,  the  SLsF  algorithm  is  equivalent  to  the  earlisst 
deadline  first  (EDF)  algorithm. 

5.2  SPF 

The  task  periods  determine  the  LCM  of  F  and  the  numbers  of  instances  for  tasks  within  the  LCM. 
In  the  most  cases,  the  task  with  the  smaller  period  has  the  tighter  timing  constraints.  Namely, 
{•^1  d"  vO  —  i^j  Vj)  Pi  ^  Pj-  To  make  the  tasks  with  the  smaller  periods  meet  their  timing 
constraints,  the  SPF  algorithm  favors  the  tasks  with  smaller  periods. 

The  SPF  algorithm  uses  the  period  as  the  key  to  arrange  all  tasks  in  non- decreasing  order.  The 
task  with  the  smallest  period  is  selected  to  schedule  first.  The  instances  of  a  particular  task  are 
scheduled  one  by  one  by  invoking  the  algorithm  in  Table  3.  After  all  the  instances  of  a  task  are 
scheduled,  the  next  task  in  the  sequence  is  scheduled. 

5.3  SJF 

’We  define  the  jitter  of  a  task  Ti  as  (A,-  rji).  It  is  proportional  to  the  range  of  the  scheduling 
window.  Hence,  The  schedulability  of  a  task  also  depends  on  the  jitter. 

Instead  of  using  the  period  as  the  measurement,  the  SJF  algorithm  assigns  the  higher  priority 
to  the  tasks  with  the  smaller  jitters.  The  task  with  the  smallest  jitter  is  scheduled  first. 


5.4  The  Solution 

The  composition  of  the  time-based  scheduling  of  a  task  instance  and  the  priority  assignment  of 
task  instances  is  shown  in  Figure  5.  The  priority  assignment  can  be  done  by  using  SLsF,  SPF,  or 
SJF.  The  function  Schedule.An.Instance()  is  invoked  to  schedule  a  single  task  instance. 


6  Experimental  Evaluation 

We  conduct  two  experiments  to  study  and  compare  the  performance  of  the  three  algorithms.  The 
purpose  oi  the  first  experiment  is  to  study  tne  effect  of  the  number  of  tasks  and  utilization  on 


Some  instance  is  unscheduled 


All  instances  are  scheduled 


Figure  5:  A  schematic  flowchart  for  the  solution 


the  schedulability  of  each  algorithm.  The  objective  of  the  second  experiment  is  to  compare  the 
performance  of  the  three  algorithms. 


6.1  The  First  Experiment 

The  task  generation  scheme  for  the  first  experiment  is  characterized  by  the  following  parameters. 

•  Periods  of  the  tasks:  We  consider  a  homogeneous  system  in  which  the  period  of  one  task 
could  be  either  the  same  as  or  multiple  of  the- period  of  another.  We  consider  a  system  with 

40,  80,  160,  320,  and  640  as  the  candidate  periods.  There  may  be  more  than  one  task  with 
the  same  period. 

•  The  execution  time  of  a  task,  e,-  :  It  has  the  uniform  distribution  over  the  range  [0,^],  where 
Pi  is  the  period  of  the  task  r,-.  The  execution  time  could  be  a  real  value. 

•  The  jitters  of  a  task:  A,-  =  t?,-  =  0.1  x  p,-. 

We  define  the  utilization  of  a  task  system  as 

(24) 

In  the  first  experiment,  the  utilization  value  and  the  number  of  tasks  in  a  set  are  the  controlled 
variables.  Given  an  utilization  value  U  and  the  number  of  tasks  N  the  scheme  first  generates  a 
run  of  raw  data  by  randomly  generating  a  set  of  N  tasks  based  on  the  the  selected  periods,  jitter 
values,  and  the  execution  time  distribution.  The  utilization  of  the  raw  data,  u.  is  then  computed  by 
Equation  24.  Finally,  the  utilization  value  of  the  raw  data  is  scaled  up  or  down  to  V  by  multiplying 
to  the  execution  time  of  each  generated  task.  As  a  consequence,  we  obtain  a  set  of  tasks  with 
the  specified  {U,N)  value. 

For  each  combination  of  {U,N)  in  which  U  =  5%,  10%,  15%,  . . .  100%  and  N  =  10,  20,  and 
30,  we  apply  the  scheme  to  generate  5000  cases  of  input  data  and  use  the  three  algorithms  to 
solve  them.  The  schedulability  degree  of  each  {U,N)  combination  for  an  algorithm  is  obtained  by 
dividing  the  number  of  solved  cases  by  5000.  Since  the  jitter  values  is  1  /lO  of  periods,  it  is  observed 
that  the  SPF  and  SJF  algorithms  yield  the  same  results.  The  results  are  shown  in  Figure  6. 

As  can  be  seen  in  Figures  6(a)  and  (b)  the  number  of  tasks  has  the  different  effects  on  the 
three  algorithms.  For  SLsF,  given  a  fixed  utiuzation  value,  the  schedulability  degree  increases 


Figure  6:  The  effect  of  the  numbers  of  tasks  on  the  schedulability 


as  the  number  of  tasks  in  a  system  becomes  bigger.  It  is  beacuse  the  execution  time  of  a  tcisk 
becomes  smaller  as  the  number  of  tasks  increases.  For  a  task  system  with  smaller  execution  time 
distribution,  the  chance  for  SLsF  to  find  a  feasible  solution  is  bigger.  The  same  phenomenon  is 
also  found  in  Figure  6(b)  for  SPF  and  SJF  in  the  low-utilization  cases  (i.e.  U  <  20%).  However, 
for  the  high-utilization  cases  in  Figure  6(b),  the  complexity  of  the  number  of  tasks  dominates  the 
algorithms  and  the  schedulability  decreases. 


6.2  The  Second  Experiment 

The  task  generation  scheme  for  the  second  experiment  is  characterized  by  the  following  parameters. 

•  LCM  =  300 

•  The  number  of  tasks  is  20. 

•  Periods  of  the  tasks:  We  consider  the  factors  of  the  LCM  as  the  periods.  They  are  20,  30, 
50,  60,  100,  150,  and  300.  There  may  be  more  than  one  task  with  the  same  period. 


•  The  execution  time  of  a.  task,  e,-  :  It  has  the  uniform  distribution  over  the  range  [0,^],  where 
Pi  is  the  period  of  the  task  t,'.  The  execution  time  could  be  a  real  value. 

•  The  jitters  of  a  task:  A:  =  ry,-  z=  0.1  x  p,-  -f  2  x  e,. 

The  generation  scheme  for  the  second  experiment  is  similar  to  the  first  one.  Given  an  utilization 
value  U,  a  set  of  20  tasks  is  randomly  generated  according  to  the  parameters  listed  above  and  then 
the  execution  time  of  each  task  is  normalized  in  order  to  make  the  utilization  value  equal  to  U 
exactly. 

We  generate  5000  cases  of  different  task  sets  for  each  utilization  value  ranging  from  0.05  to  1.00. 
The  schedulability  degree  of  each  algorithm  on  a  particular  utilization  value  is  obtained  by  dividing 
the  number  of  solved  cases  by  5000.  We  compare  the  schedulability  degrees  of  the  algorithms  on 
different  utilization  values.  The  results  are  shown  in  Figure  7(a). 

As  can  be  see  in  Figure  7(a)  the  SLsF  algorithm  outperforms  the  other  two  algorithms.  For 
example,  when  the  utilization  =  50%,  the  schedulabihty  degree  of  SLsF  is  0.575  while  those  of  SPF 
and  SJF  axe  less  than  0.2.  It  is  because  the  way  of  assigning  the  priorities  to  the  task  instances  in 
the  SLsF  algorithm  reflects  the  urgency  of  task  instances  by  considering  the  latest  start  times. 

We  also  compare  the  objective  function  value  tt  in  Equation  21  among  the  three  algorithms. 
We  define  the  normalized  objective  function  for  an  algorithm  as 

5000 

(25) 

{1  if  the  algorithm  can  not  find  a  feasible  solution  to  case  i. 

0  if  max{i)  =  Tnin{i). 

Otherwise. 

Given  case  i,  the  values  of  min{i)  and  max{i)  are  calculated  among  the  objective  values  obtained 
from  the  algorithms  which  solve  the  case.  For  the  algorithms  which  can  not  find  a  feasible  solution 
to  case  I,  the  objective  values  are  not  taken  into  account  when  min{i)  and  max{i)  are  calculated. 

The  results  of  the  normalized  objective  functions  for  each  algorithm  on  different  utilization  values 
are  shown  in  Figure  7(b). 

It  is  observed  that  in  the  low-utilization  cases  SJF  finds  feasible  solutions  with  smaller  objective 
values.  It  is  because  that  SJF  schedules  the  tasks  with  the  smallest  jitters  first.  By  scheduling 
the  tasks  with  smaller  jitter  value  first  it  is  more  easier  to  make  the  instances  of  a  tcLsk  one  period 
apart,  we  can  find  a  feasible  solution  with  smaller  objective  value.  However,  in  the  middle-  or 
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Figure  7;  The  comparison  of  three  algorithms 


high-utilization  cases,  the  schedulability  dominates  the  normalized  objective  function,  and  SLsF 
outperforms  the  other  two  algorithms  in  these  regions. 


7  Summary 

In  this  paper  we  have  considered  the  static  non-preemptive  scheduling  algorithm  on  a  single  proces¬ 
sor  for  a  set  of  periodic  tasks  with  hybrid  timing  constraints.  The  time-based  scheduling  algorithm 
is  used  to  schedule  a  task  instance  once  the  scheduling  window  of  the  instance  is  given.  We  also  have 
presented  three  priority  assignment  algorithms  for  the  task  instances  and  conducted  experiments 
to  compare  the  performance.  From  the  experimental  results,  we  see  that  the  SLsF  outperforms  the 
other  two  algorithms. 

The  techniques  presented  in  this  chapter  can  be  applied  to  multi-processor  real-time  systems. 
Communication  and  synchronization  constraints  can  be  also  incorporated.  In  our  future  work,  the 
extension  to  a  distributed  computing  systems  will  be  investigated. 
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ScheduIe_AnJ[nstance  (r/): 

Input:  A  linked  list,  a  task  instance  rf  and  a  sequence  of  sorted  free  intervals,  S  =  {  znij,  int2. 
in  which  each  interval  overlaps  (e5i(Tf  ),lst(r/)]. 


\ 


Let  the  execution  time  of  rj  be  e. 

For  n  =  1  to  |5|  do 
Let  ini ji  be 
If  /z  —  /  >  e  then 

Return  a  new  time  slot  with  start  time  =  i  and  finish  time  =  £  +  e. 

End  if. 

End  for. 

Compute  left  laxity  and  right  laxity  for  each  time  slot  in  the  linked  list  by  equations  22  and  23. 
For  n  =  1  to  |5|  do 

Let  intn  be  [£,;z]. 

If  ^  +  Pi  then  /*  Try  left  shift  first  then  right  shift  */ 

Let  the  time  slot  that  immediately  precedes  zni„  be  Wk. 

If  left.laxity{wk)  -  t>  e  then  /*  Left  shift  */ 

Left_Shift(‘infc,e  -;/  +  £). 

Return  a  new  time  slot  with  start  time  =■  p  —  e  and  finish  time  =  p. 

Else 

Let  the  time  slot  that  immediately  follows  int^  be  Wk- 
If  rightJaxity{wk)  +  fi  -  £  >  e  then  /*  Right  shift  */ 

Right_Shift(tt'^,e  -  p  -r  £). 

Return  a  new  time  slot  with  start  time  =  £  and  finish  time  =  £  +  e 
End  K. 

End  If. 

Else  /*  Try  right  shift  first  then  left  shift  */ 

Let  the  time  slot  that  immediately  follows  intn  be  Wk- 
If  right. laxity{wk)  ->r  p  -  £>  e  then  /*  Right  shift  */ 

Right_Shift(tiJi,e  -  p  -^  £). 

Return  a  new  time  slot  with  start  time  =  £  and  finish  time  =  £  +  e. 

Else 

Let  the  time  slot  that  immediately  precedes  be  u;;;. 

If  left.laxity{wk)  p  —  £  >  e  then  /*  Left  shift  */ 

Left_Shift(u;ji,e  —  p  -r  £). 

Return  a  new  time  slot  with  start  time  =  p  —  e  and  finish  time  =  p. 
End  K. 

End  If. 

End  If. 

End  for. 

Schedule  rf  at  the  end  of  linked  list. 


Table  3:  The  Scheduling  of  a  Task  Instance 
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Abstract 

High-speed  networks,  such  as  ATM  networks,  are  expected  to  support  diverse  quality-of- 
service  (QoS)  requirements,  including  real-time  QoS.  Real-time  QoS  is  required  by  many  appli¬ 
cations  such  as  voice  and  video.  To  suppon,  such  service,  routing  protocols  based  on  the  Virtual 
Circuit  (VC)  model  have  been  proposed.  However,  these  protocols  do  not  scale  well  to  large 
networks  in  terms  of  storage  and  communication  overhead. 

In  this  paper,  we  present  a  scalable  VC  routing  protocol.  It  is  based  on  the  recently  proposed 
viewserver  hierarchy,  where  each  viewserver  maintains  a  partial  view  of  the  network.  By  querying 
these  viewservers,  a  source  can  obtain  a  merged  view  that  contains  a  path  to  the  d^tination. 
The  source  then  sends  a  request  packet  over  this  path  to  setup  a  real-time  VC  through  resource 
reservations.  The  request  is  blocked  if  the  setup  fails.  We  compare  our  protocol  to  a  simple 
approach  using  simulation.  Under  this  simple  approach,  a  source  maintadns  a  full  view  of  the 
network.  In  addition  to  the  savings  in  storage,  our  results  indicate  that  our  protocol  performs 
close  to  or  better  than  the  simple  approach  in  terms  of  VC  carried  load  and  blocking  probabilitv 
over  a  wide  range  of  real-time  workload. 


Categories  and  Subject  Descriptors:  C.2.1  [Computer-Communication  Networks]:  Network  Archi¬ 
tecture  and  Design— packet  networks;  store  and  forward  networks;  C.2.2  [Computer-Communication  Net¬ 
works):  Network  Fvowcols— protocol  arckiieciure;  C.2.m  [Routing  Protocols);  F.2.m  [Computer  Network 
Routing  Protocols). 
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1  Introduction 


Integrated  services  packet-switched  networks,  such  as  Asynchronous  Transfer  Mode  (ATM)  net¬ 
works  [21],  are  expected  to  carry  a  wide  variety  of  applications  with  heterogeneous  quality  of  ser¬ 
vice  (QoS)  requirements.  For  this  purpose,  new  resource  allocation  algorithms  and  protocols  have 
been  proposed,  including  link  scheduling,  admission  control,  and  routing.  Link  scheduling  defines 
how  the  link  bandwidth  is  allocated  among  the  different  services.  Admission  control  defines  the 
criteria  the  network  uses  to  decide  whether  to  accept  or  reject  a  new  incoming  application.  Routing 
concerns  the  selection  of  routes  to  be  taken  by  application  packets  (or  ceUs)  to  reach  their  desti¬ 
nation.  In  this  paper,  we  are  mainly  concerned  with  routing  for  real-time  applications  (e.g.,  voice, 
video)  requiring  QoS  guarantees  (e.g.,  bandwidth  and  delay  guarantees). 

To  provide  real-time  QoS  support,  a  number  of  virtual-circuit  (VC)  routing  approaches  have 
been  proposed.  A  simple  (or  straightforward)  approach  to  VC  routing  is  the  link-state  fuh-view 
approach.  Here,  each  end-system  maintains  a  view  of  the  whole  network,  i.e.  a  graph  with  a  vertex 
for  every  node^  and  an  edge  between  two  neighbor  nodes.  QoS  information  such  as  delay,  band¬ 
width,  and  loss  rate  are  attached  to  the  vertices  and  the  edges  of  the  view.  This  QoS  information 
is  flooded  regularly  to  all  end-systems  to  update  their  views.  When  a  new  application  requests  ser¬ 
vice  from  the  network,  the  source  end-system  uses  its  current  view  to  select  a  source  route  to  the 
destination  end-system  that  is  likely  to  support  the  application’s  requested  QoS,  i.e.,  a  sequence  of 
node  ids  starting  from  the  source  end-system  and  ending  with  the  destination  end-system.  A  VC- 
setup  message  is  then  sent  over  the  selected  source  route  to  try  to  reserve  the  necessary  resources 
(bandwidth,  buffer  space,  service  priority)  and  estabhsh  a  VC. 

Typically,  at  every  node  the  VC-setup  message  visits,  a  set  of  admission  control  tests  are 
performed  to  decide  whether  the  new  VC,  if  estabhshed,  can  be  guaranteed  its  requested  QoS 
without  violating  the  QoS  guaranteed  to  already  established  VCs.  At  any  node,  if  these  admission 
tests  are  passed,  then  resources  are  reserved  and  the  VC-setup  message  is  forwarded  to  the  next 
node.  On  the  other  hand,  if  the  admission  tests  fail,  a  VC-rejected  message  is  sent  back  towards 
the  source  node  releasing  resource  reservations  made  by  the  VC-setup  message,  and  the  application 
request  is  either  blocked  or  another  source  route  is  selected  and  tried.  If  the  final  admission  tests 
at  the  destination  node  are  passed,  then  a  VC-established  message  is  sent  back  towards  the  source 
node  confirming  resource  reservations  made  during  the  forward  trip  of  the  VC-setup  message.  Upon 
receiving  the  VC-established  message,  the  application  can  start  transmitting  its  packets  over  its 
^  We  refer  to  switches  and  end-systems  collectively  as  nodes. 


reserved  VC.  This  VC  is  torn  down  and  resources  axe  released  at  the  end  of  the  transmission. 

Clearly,  the  above  simple  routing  scheme  does  not  scale  up  to  large  networks.  The  storage  at 
each  end-system  and  the  communication  cost  are  proportional  to  N  x  d,  where  N  is  the  number  of 
nodes  and  d  is  the  average  number  of  neighbors  to  a  node. 

A  traditional  solution  to  this  scaling  problem  is  the  area  hierarchy  used  in  routing  protocols 
such  as  the  Open  Shortest  Path  First  (OSPF)  protocol  [18].  The  basic  idea  is  to  aggregate  nodes 
hierarchically  into  areas:  “close”  nodes  are  aggregated  into  level  1  areas,  “close”  level  1  areas  are 
aggregated  into  level  2  areas,  and  so  on.  An  end-system  maintains  a  view  that  contains  the  nodes 
in  the  same  level  1  area,  the  level  1  areas  in  the  same  level  2  area,  and  so  on.  Thus  an  end-system 
maintains  a  smaller  view  than  it  would  in  the  absence  of  hierarchy.  Each  area  has  its  own  QoS 
information  derived  from  that  of  the  subareas.  A  major  problem  of  an  area-based  scheme  is  that 
aggregation  results  in  loosing  detailed  link-level  QoS  information.  This  decreases  the  chance  of  the 
routing  algorithm  to  choose  “good”  routes,  i.e.  routes  that  result  in  high  successful  VC  setup  rate 
(or  equivalently  high  carried  VC  load). 

Our  scheme 

In  this  paper,  we  present  a  scalable  V C  routing  scheme  that  does  not  suffer  from  the  problems  of 
areas.  Our  scheme  is  based  on  the  viewserver  hierarchy  we  recently  proposed  in  [3,  2]  for  large 
internetworks  and  evaluated  for  administrative  poHcy  constraints.  Here,  we  are  concerned  with  the 
support  of  performance/QoS  requirements  in  large  wide-area  ATM-like  networks,  and  we  adapt  our 
viewserver  protocols  accordingly. 

In  our  scheme,  views  are  not  maintained  by  every  end-system  but  by  special  switches  called 
viewservers.  For  each  viewserver,  there  is  a  subset  of  nodes  around  it,  referred  to  as  the  viewserver's 
precinct.  The  viewserver  only  maintains  the  view  of  its  precinct.  THs  solves  the  scaling  problem 
for  storage  requirement. 

A  viewserver  can  provide  source  routes  for  VCs  between  source  and  destination  end-systems 
in  its  precinct.  Obtaining  a  route  between  a  source  and  a  destination  that  are  not  in  any  single 
view  involves  accumulating  the  views  of  a  sequence  of  viewservers.  To  make  this  process  efficient, 
viewservers  are  organized  hierarchically  in  levels,  and  an  associated  addressing  structure  is  used. 
Each  end-system  has  a  set  of  addresses.  Each  address  is  a  sequence  of  viewserver  ids  of  decreasing 
levels,  starting  at  the  top  level  and  going  towards  the  end-system.  The  idea  is  that  when  the  views 
of  the  viewservers  in  an  address  are  merged,  the  merged  view  contains  routes  to  the  end-system 


from  the  top  level  viewservers. 

We  handle  dynamic  topology  changes  such  as  node/link  failures  and  repairs,  and  link  cost 
changes.  Nodes  detect  topology  changes  affecting  itself  and  neighbor  nodes.  Each  node  commu¬ 
nicates  these  changes  by  flooding  to  the  viewservers  in  a  specified  subset  of  nodes;  this  subset  is 
referred  to  as  its  flood  area.  Hence,  the  number  of  packets  used  during  flooding  is  proportional  to 
the  size  of  the  flood  area.  This  solves  the  scaling  problem  for  the  communication  requirement. 

Thus  our  VC  routing  protocol  consists  of  two  subprotocols;  a  view-ijuery  protocol  between  end- 
systems  and  viewservers  for  obtaining  merged  views;  and  a  view-update  protocol  hetvfeen  nodes  and 
viewservers  for  updating  views. 

Evaluation 

In  this  paper,  we  compare  our  viewserver-based  VC  routing  scheme  to  the  simple  scheme  using 
VC-level  simulation.  In  our  simulation  model,  we  define  network  topologies,  QoS  requirements, 
viewserver  hierarchies,  and  evaluation  measures.  To  the  best  of  our  knowledge,  this  is  the  first 
evaluation  of  a  dynamic  hierarchical-based  VC  routing  scheme  under  real-time  workload. 

Our  evaluation  measures  are  the  amount  of  memory  required  at  the  end-systems,  the  amount 
of  time  needed  to  construct  a  path^  the  carried  VC  load,  and  the  VC  blocking  probability.  We 
use  network  topologies  each  of  size  2764  nodes.  Our  results  indicate  that  our  viewserver-based  VC 
routing  scheme  performs  close  to  or  better  than  the  simple  scheme  in  terms  of  VC  carried  load 
and  blocking  probability  over  a  wide  range  of  workload.  It  also  reduces  the  amount  of  memory 
requirement  by  up  to  two  order  of  magnitude. 

Organization  of  the  paper 

In  Section  2,  we  survey  recent  approaches  to  VC  routing.  In  Section  3,  we  present  the  view-query 
protocol  for  static  network  conditions,  that  is,  assuming  all  hnks  and  nodes  of  the  network  remain 
operational.  In  Section  4,  we  present  the  view-update  protocol  to  handle  topology  changes.  In 
Section  5,  we  present  our  evaluation  model.  Our  results  are  presented  in  Section  6.  Section'  7 
concludes  the  paper. 


^  We  use  the  terms  route  and  path  interchangeably. 


2  Related  Work 


In  this  section,  we  discuss  routing  protocols  recently  proposed  for  packet-switched  QoS  networks. 
These  routing  protocols  can  be  classified  depending  on  whether  they  help  the  network  support 
qualitative  QoS  or  quantitative  (real-time)  QoS.  For  a  qualitative  QoS,  the  network  tries  to  provide 
the  service  requested  by  the  application  with  no  performance  guarantees.  Such  a  service  is  often 
identified  as  best-efiFort  .  A  quantitative  QoS  provides  performance  guarantees  (typically  required 
by  real-time  applications);  for  example,  an  upper  bound  on  the  end-to-end  delay  for  any  packet 
received  at  the  destination. 

Routing  protocols  that  make  routing  decisions  on  a  per  VC  basis  can  be  used  to  provide  either 
qualitative  or  quantitative  QoS.  For  a  quantitative  QoS,  some  admission  control  tests  should  be 
performed  during  the  VC-setup  message’s  trip  to  the  destination  to  try  to  reserve  resources  along 
the  VC’s  path  as  described  in  Section  1. 

On  the  other  hand,  the  use  of  routing  protocols  that  make  routing  decisions  on  a  per  packet 
basis  is  problematic  in  providing  resource  guarantees  [5],  and  qualitative  QoS  is  the  best  service 
the  network  can  offer. 

Since  we  are  concerned  in  this  paper  with  real-time  QoS,  we  limit  our  following  discussion  to 
VC  routing  schemes  proposed  or  evaluated  in  this  context.  We  refer  the  reader  to  {19,  6]  for  a  good 
survey  on  many  other  routing  schemes. 

Most  of  the  VC  routing  schemes  proposed  for  real-time  QoS  networks  are  based  on  the  link- 
state  full-view  approach  described  in  Section  1  (6,  1,  10,  24].  Recall  that  in  this  approach,  each 
end-system  maintains  a  view  of  the  whole  network,  i.e.  a  graph  with  a  vertex  for  every  node  and 
an  edge  between  two  neighbor  nodes.  QoS  information  is  attached  to  the  vertices  and  the  edees  of 
the  view.  This  QoS  information  is  distributed  regularly  to  all  end-systems  to  update  their  views 
and  thus  enable  the  selection  of  appropriate  source  routes  for  VCs,  i.e.  routes  that  are  likely  to 
meet  the  requested  QoS.  The  proposed  schemes  mainly  differ  in  how  this  QoS  information  is  used. 
Generally,  a  cost  function  is  defined  in  terms  of  the  QoS  information,  and  used  to  estimate  the 
cost  of  a  path  to  the  VC’s  destination.  The  route  selection  algorithm  then  favors  short  paths  with 
minimum  cost.  See  [17,  22]  for  an  evaluation  of  several  schemes. 

A  number  of  VC  routing  schemes  have  also  been  designed  for  networks  using  the  Virtual  Path 
(VP)  concept  [15,  14].  This  VP  concept  has  been  proposed  to  simplif}'  network  management  and 
control  by  having  separate  (logically)  fully- connected  subnetworks,  typically  one  for  each  service 
class.  In  each  VP  subnetwork,  simple  routing  schemes  that  only  consider  one-hop  and  two-hop 


paths  are  used.  However,  the  advantage  of  using  VPs  can  be  offset  by  a  decrease  in  statisticaJ 
multiplexing  gains  of  the  subnetworks  [15],  In  this  work,  we  are  interested  in  general  network 
topologies,  where  the  shortest  paths  can  be  of  arbitrary  hop  length  and  the  overhead  of  routing 
protocols  is  of  much  concern. 

All  the  above  VC  routing  schemes  are  based  on  the  link-state  approach.  VC  routing  schemes 
basea  on  the  path-vector  approach  have  also  been  proposed  [13].  In  this  approach,  for  each  desti- 
•  nation  a  node  maintains  a  set  of  paths,  one  through  each  of  its  neighbor  nodes.  QoS  information 
IS  attached  to  these  paths.  For  each  destination,  a  node  exchanges  its  best  feasible  path^  with  its 
neighbor  nodes.  The  scheme  in  [13]  provides  two  kinds  of  routes:  pre-computed  and  on-demand. 
Pre-computed  routes  match  some  well-known  QoS  requirements,  and  are  maintained  using  the 
path-vector  approach.  On-demand  routes  are  calculated  for  specific  QoS  requirements  upon  re¬ 
quest.  In  this  calculation,  the  source  broadcasts  a  special  packet  over  all  candidate  paths.  The 
destination  then  selects  a  feasible  path  from  them  and  informs  the  source  [13,  23].  One  drawback 
of  this  scheme  is  that  obtaining  on-demand  routes  is  very  expensive  since  there  are  potentially 
exponential  number  of  candidate  paths  between  the  source  and  the  destination. 

The  link-state  approach  is  often  proposed  and  favored  over  the  path- vector  approach  in  QoS 
architectures  for  several  reasons  [16].  An  obvious  reason  is  simplicity  and  complete  control  of  the 
source  over  QoS  route  selection. 

The  above  VC  routing  schemes  do  not  scale  well  to  large  QoS  networks  in  terms  of  storage 
and  communication  requirements.  Several  techniques  to  achieve  scaling  exist.  The  most  common 
technique  is  the  area  hierarchy  described  in  Section  1. 

The  landmark  hierarchy  [26,  25]  is  another  approach  for  solving  the  scaling  problem.  The  link- 
state  approach  can  not  be  used  with  the  landmark  hierarchy.  A  thorough  study  of  enforcing  QoS 
and  policy  constraints  with  this  hierarchy  has  not  been  done. 

Finally,  we  should  point  out  that  extensive  effort  is  currently  underway  to  fully  specify  and 
standardize  VC  routing  schemes  for  the  future  integrated  services  Internet  and  ATM  networks  [9]. 

3  Viewserver  Hierarchy  Query  Protocol 

In  this  section,  we  present  our  scheme  for  static  network  conditions,  that  is,  all  links  and  nodes 
remain  operational.  The  dynamic  case  is  presented  in  Section  4. 

A  feasible  path  is  a  path  that  satisfies  the  QoS 


constraints  of  the  nodes  in  the  path. 


Conventions:  Each  node  has  a  unique  id.  Nodelds  denotes  the  set  of  node-ids.  For  a  node  tt,  we 
use  nodeid{u)  to  denote  the  id  of  u.  NodeNeighbor^u)  denotes  the  set  of  ids  of  the  neighbors  of  u. 

In  our  protocol,  a  node  u  uses  two  kinds  of  sends.  The  first  kind  has  the  form  “Send(m)  to  u”, 
where  m  is  the  message  being  sent  and  v  is  the  destination-id.  Here,  nodes  u  and  v  are  neighbors, 
and  the  message  is  sent  over  the  physical  link  (ti,  u).  If  the  link  is  down,  we  ZLSsume  that  the  packet 
is  dropped. 

The  second  kind  of  send  ha.s  the  form  “Send(m)  to  v  using  sr”,  where  m  and  v  are  as  above 
and  ST  is  a  source  route  between  u  and  v.  We  assume  that  as  long  as  there  is  a  sequence  of  up 
links  connecting  the  nodes  in  sr,  the  message  is  delivered  to  v.  This  requires  a  transport  protocol 
support  such  as  TCP  [20]. 

To  implement  both  kind  of  sends,  we  assume  there  is  a  reserved  VC  on  each  link  for  sending 
routing,  signahng  and  control  messages  [4].  This  also  ensures  that  routing  messages  do  not  degrade 
the  QoS  seen  by  applications. 

Views  and  Viewservers 

Views  are  maintained  by  special  nodes  called  viewse'rvers.  Each  viewserver  hais  a  precinct^  which  is 
a  set  of  nodes  around  the  viewserver.  A  viewserver  maintains  a  view^  consisting  of  the  nodes  in  its 
precinct,  links  between  these  nodes  and  links  outgoing  from  the  precinct^.  Formally,  a  viewserver 
X  maintains  the  following: 

Precinct^  C  Nodelds.  Nodes  whose  view  is  maintained. 

View^.  View  of  x, 

=  {(u,  timestamp,  expirytime,  {{v,  cost)  :  v  £  Nod€Neighbors{u)})  : 
u  £  Precinct:^} 

The  intention  of  Viewx  is  to  obtain  source  routes  between  nodes  in  PveciTictx^  Hence,  the 
,  choice  of  nodes  to  include  in  PreciTictr;  and  the  choice  of  links  to  include  in  View^  are  not  arbitrary. 
Precincix  and  View^  must  be  connected;  that  is,  between  any  two  nodes  in  Precincix,  there  should 
be  a  path  in  VieWx-  Note  that  Viev^x  can  contain  links  to  nodes  outside  Precinctx-  We  say  that  a 
node  u  is  in  the  view  of  a  viewserver  x,  if  either  u  is  in  the  precinct  of  2:,  or  Viewx  has  a  hnk  from 
a  node  in  the  precinct  of  x  to  node  u.  Note  that  the  precincts  and  views  of  different  viewservers 
can  be  overlapping,  identical  or  disjoint. 


Not  all  the  links  need  to  be  included. 


For  a  link  (u,  v)  in  the  view  of  a  viewserver  x,  View^  stores  a  cost.  The  cost  of  the  link  {u,  v) 
equals  a  vector  of  values  if  the  link  is  known  to  be  up;  each  cost  value  estinaates  how  expensive  it 
is  to  cross  the  link  according  to  some  QoS  criteria  such  as  delay,  throughput,  loss  rate,  etc.  The 
cost  equals  oo  if  the  link  is  known  to  be  down.  Cost  of  a  link  changes  with  time  (see  Section  4). 
The  view  also  includes  timestamp  and  expirytime  fields  which  axe  described  in  Section  4. 


Viewserver  Hierarchy 

For  scaling  reasons,  we  cannot  have  one  large  view.  Thus,  obtaining  a  source  route  between  a  source 
and  a  destination  which  are  far  away,  involves  accumulating  views  of  a  sequence  of  viewservers.  To 
keep  this  process  efficient,  we  organize  viewservers  hierarchically.  More  precisely,  each  viewserver  is 
assigned  a  hierarchy  level  from  0,1,.. .,  with  0  being  the  top  level  in  the  hierarchy.  A  parent-child 
relationship  between  viewservers  is  defined  as  follows: 

1.  Every  level  i  viewserver,  i  >  0,  has  a  parent  viewserver  whose  level  is  less  than  i. 

2.  If  viewserver  x  is  a  parent  of  viewserver  y  then  x’s  precinct  contains  y  and  y’s  precinct 
contains  x, 

3.  The  precinct  of  a  top  level  viewserver  contains  all  other  top  level  viewservers. 

In  the  hierarchy,  a  parent  can  have  many  children  and  a  child  can  have  many  parents.  We  extend 
the  range  of  the  parent-child  relationship  to  ordinary  nodes;  that  is,  if  Precinct^  contains  the  node 
ii,  we  say  that  u  is  a  child  of  i,  and  x  is  a  parent  of  u.  We  assume  that  there  is  at  least  one  parent 
viewserver  for  each  node. 

For  a  node  u,  an  address  is  defined  to  be  a  sequence  (xo,Xi, . .  .,Xt}  such  that  x,-  for  i  <  t  is 
a  viewserver-id,  xo  is  a  top  level  viewserver-id,  x,  is  the  id  of  u,  and  x,-  is  a  parent  of  x.+j.  A 
node  may  have  many  addresses  since  the  parent-child  relationship  is  many-to-many.  If  a  source 
node  wants  to  establish  a  VC  to  a  destination  node,  it  first  queries  the  name  servers  to  obtain  a 
set  of  addresses  for  the  destination^.  Second,  it  queries  viewservers  to  obtain  an  accumulated  view 
containing  both  itself  and  the  destination  node  (it  can  reach  its  parent  viewservers  by  using  fixed 
source  routes  to  them).  Then,  it  chooses  a  feasible  source  route  from  this  accumulated  view  and 
initiates  the  VC  setup  protocol  on  this  path. 

View-Query  Protocol:  Obtaining  Source  Routes 
We  now  describe  how  a  source  route  is  obtained. 


^  Querying  the  name  servers  can  be  done  in  the 


same  way  as  is  done  currently  in  the  Internet. 


We  want  a  sequence  of  viewservers  whose  merged  views  contains  both  the  source  and  the 
destination  nodes.  Addresses  provide  a  way  to  obtain  such  a  sequence,  by  first  going  up  in  the 
viewserver  hierarchy  starting  from  the  source  node  and  then  going  down  in  the  viewserver  hierarchy 
towards  the  destination  node.  More  precisely,  let  be  an  address  of  the  source,  and 

(do, . . . ,  d/)  be  an  address  of  the  destination.  Then,  the  sequence  {st-i , . . . ,  sq,  do, . . . ,  d;_i)  meets 
oxir  requirements.  In  fact,  going  up  aD  the  way  in  the  hierarchy  to  top  level  viewservers  may  not 
be  necessary.  We  can  stop  going  up  at  a  viewserver  S{  if  there  is  a  viewserver  dj,j  <  /,  in  the  view 
of  Si  (one  special  case  is  where  s,-  =  dj). 

The  view-query  protocol  uses  two  message  types: 

•  (RequestView,  sjiddress,  djiddress) 

where  s-address  and  djiddress  are  the  addresses  for  the  source  and  the  destination  respec¬ 
tively.  A  RequestView  message  is  sent  by  a  source  node  to  obtain  an  accumulated  view  con¬ 
taining  both  the  source  and  the  destination  nodes.  When  a  viewserver  receives  a  RequestView 
message,  it  either  sends  back  its  view  or  forwards  this  request  to  another  viewserver. 

•  (Reply View,  s_address,  d_address,  accumview) 

where  SMddress  and  djaddress  are  as  above  and  accumview  is  the  accumulated  view.  A 
ReplyView  message  is  sent  by  a  viewserver  to  the  source  or  to  another  viewserver  closer  to 
the  source.  The  accumview  field  in  a  ReplyView  message  equals  the  union  of  the  views  of 
the  viewservers  the  message  has  visited. 

We  now  describe  the  view-query  protocol  in  more  detail  (please  refer  to  Figures  1  and  2).  To 
establish  a  VC  to  a  destination  node,  the  source  node  sends  a  RequestView  packet  containing  the 
source  and  the  destination  addresses  to  its  parent  in  the  source  address. 

Upon  receiving  a  RequestView  packet,  a  viewserver  x  checks  if  the  destination  node  is  in  its 
precinct®.  If  it  is,  x  sends  back  its  view  in  a  ReplyView  packet.  If  it  is  not,  x  forwards  the  request 
packet  to  another  viewserver  as  foUows  (details  in  Figure  2):  x  checks  whether  any  viewserver  in 
the  destination  address  is  in  its  view.  If  there  is  such  a  viewserver,  x  sends  the  RequestView  packet 
to  the  last  such  one  in  the  destination  address.  Otherwise  i  is  a  viewserver  in  the  source  address, 
and  it  sends  the  packet  to  its  parent  in  the  source  address. 

V/hen  a  viewserver  x  receives  a  ReplyView  packet,  it  merges  its  view  to  the  accumulated  view 
in  the  packet.  Then  it  sends  the  ReplyView  packet  towards  the  source  node  in  the  same  way  it 
would  send  a  RequestView  packet  towards  the  destination  node  (i.e.  the  roles  of  the  source  address 

Even  though  the  destination  can  be  in  the  view  of  i,  its  QoS  characteristics  is  not  in  the  view  if  it  is  not  in  the 
precinct  of  2. 


Constants 

FixtdRoutes^{x)^  for  every  viewserver-id  x  such  that  i  is  a  parent  of  u, 

=  {(yi)  •  •  •  jl/n)  *  y*'  €  Kodelds}.  Set  of  routes  to  i 

Events 

RequtsiViewu{s^address,  dMddress)  {Executed  when  u  wants  a  source  route} 

Let  s^addTcss  be  (so, . . s<},  and  sr  €  FixedRoutes^,{si^{)\ 

Send(RequestViey,  sjaddress,  djaddress)  to  using  sr 

iieceiveu  (ReplyViey,  suaddress,  duiddress,  accumview) 

Choose  a  feasible  source  route  using  accumview; 

If  a  feasible  route  is  not  found 

Execute  RequestView^  again  with  another  source  address  and/or  destination  address 


Figure  1:  View-query  protocol;  Events  and  state  of  a  source  node  u. 


Constants 

Precincis;,  Precinct  of  x. 

Variables 

Viewj;.  View  of  x. 

Events 

Receiver (B.eqxiestViev^  s^address^  d^ddress) 

Let  dMddress  be  (do, ,  dj); 
if  dt  ^  Pvecincir  then 

/oru;ardr(RequestViey,  sjaddress^  djaddress,  {}); 

else /oru;ard3: (Reply Vi ey,  d»address,  s-address,  Viewr);  {addresses  are  switched} 

endif 

iiecefi;ex(ReplyViey,  sjaddress,  djaddress,  view) 

forwardriKe-plyViev,  s^address,  djaddress,  viewUVieWr) 

where  procedure  forwardr{iype,  s-cddress,  djaddress,  view) 

Let  s^address  be  (so,...,S:),  djaddress  be  (do,...,d/); 

if  Bz  :  di  in  Viewr  then 

Let  i  =  max{y  :  dj  in  Viewr}; 

target  :=  d,-; 

else  target  :=  s,*  such  that  s,+i  =  nodeid{x); 
endif; 

sr  :=  choose  a  route  to  target  from  nod€id{x)  using  Viewr; 
if  type  =  RequestViey  then 

Send(R€qnestViey,  sjaddrtss,  djaddress)  to  target  using  sr; 

else  Send(ReplyViey,  sjaddress,  dMddress,  view)  to  target  using  sr; 

endif 


Figure  2:  View-query  protocol:  Events  and  state  of  a  viewserver  x. 


and  the  destination  address  are  interchanged). 


When  the  source  receives  a  ReplyVieu  packet,  it  chooses  a  feasible  path  using  the  accumview 
in  the  packet.  If  it  does  not  find  a  feasible  path,  it  can  try  again  using  a  different  source  and/or 
destination  addresses.  Note  that  the  source  does  not  have  to  throw  away  the  previous  accumulated 
views;  it  can  merge  them  all  into  a  richer  accumulated  view.  In  fact,  it  is  easy  to  change  the  protocol 
so  that  the  source  can  also  obtain  views  of  individual  viewservers  to  make  the  accumulated  view 
even  richer.  Once  a  feasible  source  route  is  found,  the  source  node  initiates  the  VC  setup  protocol. 

Above  we  have  described  one  possible  way  of  obtaining  the  accumulated  views.  There  are 
various  other  possibilities,  for  example:  (l)  restricting  the  ReplyVieu  packet  to  take  the  reverse 
of  the  path  that  the  RequestVieu  packet  took;  (2)  having  ReplyVieu  packets  go  all  the  way 
up  in  the  viewserver-hierarchy  for  a  richer  accumulated  view;  (3)  having  the  source  poll  the 
viewservers  directly  instead  of  the  viewservers  forwarding  request/reply  messages  to  each  other; 
(4)  not  including  non- transit  nodes  (e.g.  end-systems)  other  than  the  source  and  the  destination 
nodes  in  the  accumview,  (5)  including  some  QoS  requirements  in  the  RequestVieu  packet,  and 
having  the  viewservers  filter  out  some  nodes  and  links. 

4  Update  Protocol  for  Dynamic  Network  Conditions 

In  this  section,  we  first  describe  how  topology  changes  such  as  link/node  failures,  repairs  and  cost 
changes,  are  detected  and  communicated  to  viewservers,  i.e.  the  view-update  protocol.  Then,  we 
modify  the  view-query  protocol  appropriately. 

View-Update  Protocol:  Updating  Views 

Viewservers  do  not  communicate  with  each  other  to  maintain  their  views.  Nodes  detect  and 
communicate  topology  changes  to  viewservers.  Updates  are  done  periodically  and  also  optionally 
after  a  change  in  the  outgoing  link  costs. 

The  communication  between  a  node  and  viewservers  is  done  by  hooding  over  a  set  of  nodes. 
This  set  is  referred  to  as  the  flood  area.  The  topology  of  a  flood  area  must  be  a  connected  graph. 
For  efncienc)\  the  flood  area  can  be  implemented  by  a  hop-count. 

Due  to  the  nature  of  flooding,  a  viewserver  can  receive  information  out  of  order  from  a  node.  In 
order  to  avoid  old  information  replacing  new  information,  each  node  includes  successively  increasing 
time  stamps  in  the  messages  it  sends.  The  timestamp  field  in  the  view  of  a  viewserver  equaJs  the 
Isirgest  timestamp  received  from  each  node.  ’ 


Due  to  node  and  link  failures,  communication  between  a  node  and  a  viewserver  can  fail,  resulting 
in  the  viewserver  having  out-of-date  information.  To  eliminate  such  information,  a  viewserver 
deletes  any  information  about  a  node  if  it  is  older  than  a  time-io-die  period.  The  expiryiime  field 
in  the  view  of  a  viewserver  equals  the  end  of  the  time-to-die  period  for  a  node.  We  assume  that 
nodes  send  messages  more  often  than  the  time-to-die  value  (to  avoid  false  removal). 

The  view-update  protocol  uses  one  type  of  message  as  follows: 

•  (Update,  nid,  timestamp,  floodarea,  ncostset) 

IS  sent  by  the  node  to  inform  the  viewservers  about  current  costs  of  its  outgoing  links.  Here, 
md  and  timestamp  indicate  the  id  and  the  time  stamp  of  the  node,  ncostset  contains  a  cost 

for  each  outgoing  link  of  the  node,  and  floodarea  is  the  set  of  nodes  that  this  message  is  to 
be  sent  over. 

Constants: 

FloodAreag.  (C  Nodelds),  The  flood  area  of  the  node. 

Variables: 

Clockg  :  Integer.  Clock  of  g. 

Figure  3:  State  of  a  node  g. 

The  state  maintained  by  a  node  g  is  listed  in  Figure  3.  We  assume  that  consecutive  reads  of 
Clockg  retnrns  increasing  values. 

Constants: 

Precincix.  Precinct  of  x. 

TimeToDiCx  :  Integer.  Time-to-die  value. 

Variables: 

VitWx-  Vie'w  of  x. 

Clocks  :  Integer.  Clock  of  x. 

Figure  4:  State  of  a  viewserver  x. 

The  state  maintained  by  a  viewserver  x  is  listed  in  Figure  4. 

The  events  of  node  g  are  specified  in  Figure  5.  The  events  of  a  viewserver  x  are  specified  in 
Figure  6.  When  a  viewserver  x  recovers,  View^  is  set  to  {}.  Its  view  becomes  up-to-date  as  it 
receives  new  information  from  nodes  (and  remove  false  information  with  the  time-to-die  period). 


update g  {Executed  periodically  and  also  optionally  upon  a  change  in  outgoing  link  costs} 

ncosisei  compute  costs  for  each  outgoing  link; 

((Update,  nodtid{g),  Clockg,  Flood Areag,ncosisti))\ 

Receive g{jpackei)  {an  Update  packet) 

} load g  [packet) 

where  procedure  f  I oodg  (packet) 

if  nodeid{g)  G  packet,/ 1 oodarea  then 

{remove  g  from  the  food  area  to  avoid  infinite  exchange  of  the  same  message.) 

packet,/ 1 oodarea  :=  packet./ 1 oodarea  —  {nodcid{g)]] 

for  all  h  G  NodeNeighbors(g)  A  h  £  packet, /I oodarea  do 

Send(packei)  to  h\ 

endif 


Node  Failure  Model:  A  node  can  undergo  failures  and  recoveries  at  anytime.  We  assume  failures  are 
fail-stop  (i.e.  a  failed  node  does  not  send  erroneous  messages). 


Figure  5:  View-update  protocol:  Events  of  a  node  g. 


iJeceivej: (Update,  md,  ts,  FloodArea^  ncset) 
if  nid  G  Precincts  then 

if  3 (nid,  timestamp^  expirytime^  ncosisei)  GView^  A  is  >  timestamp  then 

{received  is  more  recent;  delete  the  old  one) 

delete  (nid,  timestamp,  expirytime,  ncosisei)  from  View^] 

endif 

if  ~i3(md,  timestamp,  expirytime,  ncosisei)  G  View^  then 
ncosisei  :=  subset  of  edge-cost  pairs  in  ncset  that  are  in  Viettv; 
insert  (nid,  is.  Clocks -VTimeToDits^  ncosisei)  to  Vieu;^; 
endif 
endif 

Deletes  {Executed  periodically  to  delete  entries  older  than  the  time-to-die  period) 

for  all  (nid,  isiamp,  expiryiime,  ncset)  G  VieWr  A  expirytime  <  Clocks  do 
delete  (nid,  istamp,  expirytime,  ncset)  from  Views', 

Viewserver  Failure  Model:  A  viewserver  can  undergo  failures  and  recoveries  at  anytime.  We  assume 
failures  are  fail-stop.  When  a  viewserver  x  recovers,  Views  is  set  to  {). 


Figure  6:  View  update  events  of  a  viewserver  x. 

Changes  to  View-Query  Protocol 

We  now  enumerate  the  changes  needed  to  adapt  the  view-query  protocol  to  the  dynamic  case  (the 
formal  specification  is  omitted  for  space  reasons). 

Due  to  link  and  node  failures,  RequestViev  and  ReplyViev  packets  can  get  lost.  Hence,  the 


source  may  never  receive  a  ReplyVieu  packet  after  it  initiates  a  request.  Thus,  the  source  should 
try  again  after  a  time-out  period. 

When  a  viewserver  receives  a  RequestView  message,  it  should  reply  with  its  views  only  if  the 
destination  node  is  in  its  precinct  and  its  view  contains  a  path  to  the  destination.  Similarly  during 
forwarding  of  RequestView  and  ReplyView  packets,  a  viewserver,  when  checking  whether  a  node 
is  in  its  view,  should  also  check  if  its  view  contains  a  path  to  it. 

5  Evaluation 

In  this  section,  we  present  the  parameters  of  our  simulation  model.  We  use  this  model  to  com¬ 
pare  our  viewserver-based  VC  routing  protocols  to  the  simple  approach.  The  results  obtained  are 
presented  in  Section  6. 

Network  Parameters 

We  model  a  campus  network  which  consists  of  a  campus  backbone  subnetwork  and  several  depart¬ 
ment  subnetworks.  The  backbone  network  consists  of  backbone  switches  and  backbone  links. 

Each  department  network  consists  of  a  hub  switch  and  several  non-hub  switches.  Each  non-hub 
switch  has  a  link  to  the  department’s  hub  switch.  And  the  department’s  hub  switch  has  a  link  to 
one  of  the  backbone  switches.  A  non-hub  switch  can  have  links  to  other  non-hub  switches  in  the 
same  department,  to  non-hub  switches  in  other  departments,  or  to  backbone  switches. 

End-systems  are  connected  to  non-hub  switches.  An  example  network  topologj’  is  shown  in 
Figure  7. 

In  our  topology,  there  are  8  backbone  switches  and  32  backbone  links.  There  are  16  departments. 
There  is  one  hub-switch  in  each  department.  There  is  a  total  of  240  non-hub  switches  randomly 
assigned  to  different  departments.  There  are  2500  end-systems  which  are  randomly  connected  to 
non-hub  switches.  Thus,  we  have  a  total  of  2764  nodes. 

In  addition  to  the  links  connecting  non-hub  switches  to  the  hub  switches  and  hub  switches  to 
the  backbone  switches,  there  are  720  links  from  non-hub  switches  to  non-hub  switches  in  the  same 
department,  there  are  128  links  from  non-hub  switches  to  non-hub  switches  in  different  departments, 
and  there  are  64  links  from  non-hub  switches  to  backbone  switches. 

The  end-points  of  each  link  are  chosen  randomly.  However,  we  make  sure  that  the  backbone 
network  is  connected;  and  there  is  a  link  from  node  u  to  node  v  iff  there  is  a  link  from  node  v  to 


Backbone 


Department  1  Department  2 


^  Backbone  switches 


Hub  switches 


O  Non-hub  switches 
I  I  End-systems 


Figure  7:  An  example  network  topology. 

node  u. 

Each  link  hais  a  total  of  C  units  of  bandwidth. 

QoS  and  Workload  Parameters 

In  our  evaluation  model,  we  assume  that  a  VC  requires  the  reservation  of  a  certain  amount  of 
bandwidth  that  is  enough  to  ensure  an  acceptable  QoS  for  the  application.  This  reservation  amount 
can  be  thought  of  either  as  the  peak  transmission  rate  of  the  VC  or  its  ‘'effective  bandwidth”  [12] 
varying  between  the  peak  and  average  transmission  rate. 

VC  setup  requests  arrive  to  the  network  according  to  a  Poisson  process  of  rate  A,  each  requiring 
one  unit  of  bandwidth.  Each  VC,  once  it  is  successfully  setup,  has  a  lifetime  of  exponential  duration 
with  mean  l//z.  The  source  and  the  destination  end-systems  of  a  VC  are  chosen  randomly. 

An  arriving  VC  is  admitted  to  the  network  if  at  least  one  feasible  path  between  its  source  and 
destination  end-systems  is  found  by  the  routing  protocol,  where  a  feasible  path  is  one  that  has  links 
with  non- zero  available  capacity.  From  the  set  of  feasible  paths,  a  minimum  hop  path  is  used  to 
establish  the  VC;  one  unit  of  bandwidth  is  allocated  on  each  of  its  links  for  the  lifetime  of  the  VC. 
On  the  other  hand,  if  a  feasible  path  is  not  found,  then  the  arriving  VC  is  blocked  and  lost. 

We  assume  that  the  available  link  capacities  in  the  views  of  the  viewservers  are  updated  instan- 


taneousl}'  ■whenever  a  VC  is  admitted  to  the  net'W'ork  or  terminates. 


Viewserver  Hierarchy  Schemes 

We  have  evaluated  our  viewserver  protocol  for  several  different  viewserver  hierarchies  and  query 
methods.  We  next  describe  the  different  viewserver  schemes  evaluated.  Please  refer  to  Figure  7  in 
the  following  discussion. 

The  first  viewserver  scheme  is  referred  to  as  base.  Each  switch  is  a  viewserver.  A  viewserver’s 
precinct  consist  of  itself  and  the  neighboring  nodes.  The  links  in  the  viewserver’s  view  consist  of 
the  links  between  the  nodes  in  the  precinct,  and  links  outgoing  from  nodes  in  the  precinct  to  nodes 
not  in  the  precinct.  For  example,  the  precinct  of  viewserver  u  consists  of  nodes  u,v,w,s. 

As  for  the  viewserver  hierarchy,  a  backbone  switch  is  a  level  0  viewserver,  a  hub  switch  is  a 
level  1  viewserver  and  a  non-hub  switch  is  a  level  2  viewserver.  Parent  of  a  hub  switch  viewserver 
IS  the  backbone  switch  viewserver  it  is  connected  to.  Parent  of  a  non-hub  switch  viewserver  is  the 

hub  switch  viewserver  in  its  department.  Parent  of  an  end-system  is  the  non-hub  switch  viewserver 
it  is  connected  to. 

We  use  only  one  address  for  each  end-system.  The  viewserver-address  of  an  end-system  is  the 
concatenation  of  four  ids.  Thus,  the  address  of  s  is  z.v.u.s.  Similarly,  the  address  of  d  is  z.v.x.d. 
To  obtain  a  route  between  s  and  d,  it  suffices  to  obtain  views  of  viewservers  u.v,x. 

The  second  viewserver  scheme  is  referred  to  as  base-QT  (where  the  QT  stands  for  “query  up 
to  top  ).  It  is  identical  to  base  except  that  during  the  query  protocol  all  the  viewservers  in  the 
source  and  the  destination  addresses  are  queried.  That  is,  to  obtain  a  route  between  s  and  d,  the 
views  of  u^v.z.z  are  obtaiued. 

The  third  viewserver  scheme  is  referred  to  as  vertex-extension.  It  is  identical  to  base  except 
that  viewserver  precincts  are  extended  as  follows:  Let  P  denote  the  precinct  of  a  viewserver  in  the 
base  scheme.  For  each  node  u  in  P,  if  there  is  a  link  from  node  u  to  node  u  and  v  is  not  in  P,  node 
V  is,  added  to  the  precinct;  among  v's  links,  only  the  ones  to  nodes  in  P  are  added  to  the  view.  In 
the  example,',  nodes  z,y,x,gz.Te  added  to  the  precinct  of  u,  but  outgoing  links  of  these  nodes  to 
other  nodes  ^e  not  included  (e.g.  (x,p)  and  (r,  q)  are  not  included).  The  advantage  of  this  scheme 
is  that  even  though  it  increases  the  precinct  size  by  a  factor  of  d  (where  d  is  the  average  number  of 
neighbors  to  a  node),  it  increases  the  number  of  links  stored  in  the  view  by  a  factor  less  than  2. 

The  fourth  viewserver  scheme  is  referred  to  as  vertex-extension-QT.  It  is  identical  to  vertex- 
extension  except  that  during  the  query  protocol  all  the  viewservers  in  the  source  and  the  destination 


addresses  are  queried. 


6  Numerical  Results 

6.1  Results  for  Network  1 

The  parameters  of  the  first  network  topology,  referred  to  as  Network  1,  are  given  in  Section  5.  The 
link  capacity  C  is  taken  to  be  20  [6],  i.e.  a  link  is  capable  of  carrying  20  VCs  simultaneously. 

Our  evaluation  measures  were  computed  for  a  (randomly  chosen  but  fixed)  set  of  100.000  VC 
setup  requests.  Table  1  lists  for  each  viewserver  scheme  (1)  the  minimum,  average  and  maximum 
of  the  precinct  sizes  (in  number  of  nodes),  (2)  the  minimum,  average  and  maximum  of  the  merged 
view  sizes  (in  number  of  nodes),  and  (3)  the  minimum,  average  and  maximum  , of  the  number  of 
viewservers  queried. 


Scheme 

Precinct  Size 

Merged  View  Size 

No.  of  Viewservers  Queried 

base 

5  /  16.32  /  28 

4  /  56.46  /  81 

1  /  5.49  /  6 

base-QT 

5  /  16.32  /  28 

27  /  59.96  /  81 

6  /  6.00  /  6 

vertex- extension 

22  /  88.11  /  288 

14  /  155.86  /  199 

1  /  5.49  /  6 

vertex-  extension-  Q  T 

22  /  88.11  /  288 

113  /  163.28  /  199 

6  /  6.00  /  6 

Table  1:  Precinct  sizes,  merged  view  sizes,  and  number  of  viewservers  queried  for  Network  1. 


The  precinct  size  indicates  the  memory  requirement  at  a  viewserver.  More  precisely,  the  memory' 
requirement  at  a  viewserver  is  0(precinct  size  x  d),  except  for  the  vertex- extension  and  vertex- 
extension-QT  schemes.  In  these  schemes,  the  memory  requirement  is  increased  by  a  factor  less 
than  two.  Hence  these  schemes  have  the  same  order  of  viewserver  memory  requirement  as  the  base 
and  hase-QT  schemes. 

The  merged  view  size  indicates  the  memory  requirement  at  a  source  end-system  during  the 
query  protocol;  i.e.  the  memory  requirement  at  a  source  end-system  is  0(merged  view  size  x  d) 
except  for  the  vertex-extension  vertex-extension-QT  schemes.  Note  that  the  source  end-system 

does  not  need  to  store  information  about  end-systems  other  than  itself  and  the  destination.  The 
numbers  in  Table  1  take  advantage  of  this. 

The  number  of  viewservers  queried  indicates  the  communication  time  required  to  obtain  the 
merged  view  at  the  source  end-sy'stem.  Hence,  the  ^real-time”  communication  time  required  to 
obtain  the  merged  view  at  a  source  is  slightly'  more  than  one  round-trip  time  between  the  source 


and  the  destination. 


As  IS  apparent  from  Table  1,  using  a  QT  scheme  increases  the  merged  view  size  by  about  6%, 
and  the  number  of  viewservers  queried  by  about  9%.  Using  the  vertex- extension  scheme  increases 
the  merged  view  size  by  about  3  times  (note  that  the  amount  of  actual  memory  needed  increases 
only  by  a  factor  less  than  2). 

The  above  measures  show  the  memory  and  time  requirements  of  our  protocols.  They  clearly 
indicate  the  savings  in  storage  over  the  simple  approach  as  manifested  by  the  smaller  view  sizes.  To 
answer  whether  the  viewserver  hierarchy  finds  many  feasible  paths,  other  evaluation  measures  such 
as  the  earned  VC  load  and  the  percent  VC  blocking  are  of  interest.  They  are  defined  as  follows; 

•  CarHed  VC  load  is  the  average  number  of  VCs  carried  by  the  network. 

•  Percent  VC  blocking  k  the  percentage  of  VC  setup  requests  that  are  blocked  due  to  the  fact 
that  a  feaisible  path  is  not  found." 

.  In  our  experiments,  we  keep  the  average  VC  lifetime  (l/;x)  fixed  at  15000  and  vary  the  arrival 
rate  of  VC  setup  requests  (A).  Figure  8  shows  the  carried  VC  load  versus  A  for  the  simple  approach 
and  the  viewserver  schemes.  Figure  9  shows  the  percent  VC  blocking  versus  A.  At  low  values  of  A, 
aH  the  viewserver  schemes  are  very  close  to  the  simple  approach.  At  moderate  values  of  A,  the  base 
and  bose-QT  schemes  perform  badly.  The  vertex-extension  and  veHex-extension-QT  sc\iemes  are 
still  very  close  to  the  simple  approach  (only  Z.4%  less  carried  VC  load).  Note  that  the  performance 
of  the  viewserver  schemes  can  be  further  improved  by  trying  more  viewserver  addresses. 

Surprisingly,  at  high  values  of  A,  all  the  viewserver  schemes  perform  better  than  the  simple 
approach.  At  A  =  0.5,  the  network  with  the  base  scheme  carries  about  30%  higher  load  than  the 
simple  approach.  This  is  an  interesting  result.  Our  explanation  is  as  follows.  Elsewhere  [2j.  we 
have  found  that  when  the  viewserver  schemes  can  not  find  an  existing  feasible  path,  this  path  is 
usually  very  long  (more  than  11  hops).  This  causes  our  viewserver  hierarchy  protocols  to  reject 
VCs  that  are  admitted  by  the  simple  approach  over  long  paths.  The  use  of  long  paths  for  VCs  is 

undesirable  since  it  ties  up  resources  at  more  intermediate  nodes,  which  can  be  used  to  admit  many 
shorter  length  VCs. 

In  conclusion,  we  recommend  the  vertex-extension  scheme  as  it  performs  close  to  or  better 
than  all  other  schemes  in  terms  of  VC  carried  load  and  blocking  probability  over  a  wide  range  of 
workload.  Note  that  for  all  viewserver  schemes,  adding  .<?r  yields  sHghtly  further  improvement. 

Recall  that  we  assume  a  blocked  VC  setup  request  is  cleared  (i.e.  lost). 


CARRIED  VC  LOAD  vs  Arrival  rate 


6.2  Results  for  Network  2 

The  parameters  of  the  second  network,  referred  to  as  Network  2,  are  the  same  as  the  parameters 
of  Network  1.  However,  a  different  seed  is  used  for  the  random  number  generation,  resulting  in  a 
different  topology  and  distribution  of  source-destination  end-system  pairs  for  the  VCs. 

We  again  take  C  =  20,  and  we  fix  l//i  at  15000.  Our  evaluation  measures  were  computed  for 


a  set  of  100,000  VC  setup  requests.  Table  2,  and  Figures  10  and  11  show  the  results.  Similar 
conclusions  to  Network  1  hold  for  Network  2.  An  interesting  exception  is  that  at  high  values  of  A, 
we  observe  that  the  vertex-exiension  scheme  performs  slightly  better  than  the  vertex- extension- QT 
scheme  (about  4.2%  higher  carried  VC  load).  The  reason  is  the  following:  Adding  QT  gives  richer 
merged  views,  and  hence  increases  the  chance  of  finding  a  feasible  path  that  is  possibly  long.  As 
explained  in  Section  6.1,  this  results  in  performance  degradation. 


Scheme 

Precinct  Size 

Merged  View  Size 

No.  of  Viewservers  Queried 

base 

4  /  16.32  /  33 

4  /  57.61  /  80 

1  /  5.52  /  6 

base-QT 

4  /  16.32  /  33 

30  /  60.64  /  80 

6  /  6.00  /  6 

vertex- extension 

17/  90.36  /  282 

16  /  159.70  /  214 

1  /  5.52  /  6 

vertex-  extension-  Q  T 

17  /90.36  /  282 

113  /  166.97  /  214 

6  /  6.00  /  6 

Table  2:  Precinct  sizes,  merged  view  sizes,  and  number  of  viewservers  queried  for  Network  2. 


We  have  repeated  the  above  evaluations  for  other  networks  and  obtained  similaT  conclusions. 

7  Conclusions 

We  presented  a  hierarchical  VC  routing  protocol  for  ATM-like  networks.  Our  protocol  satisfies  QoS 
constraints,  adapts  to  dynamic  topology  changes,  and  scales  well  to  large  number  of  nodes. 

Our  protocol  uses  partial  views  maintained  by  viewservers.  The  viewservers  are  organized 
hierarchically.  To  setup  a  VC,  the  source  end-system  queries  viewservers  to  obtain  a  merged  view 
that  contains  itself  and  the  destination  end-system.  This  merged  view  is  then  used  to  compute  a 
source  route  for  the  VC. 

We  evaluated  several  viewserver  hierarchy  schemes  and  compared  them  to  the  simple  approach. 
Our  results  on  2764-node  networks  indicate  that  the  vertex-extension  scheme  performs  dose  to  or 
better  than  the  simple  approach  in  terms  of  VC  carried  load  and  blocking  probability  over  a  wide 
range  of  real-time  workload.  It  also  reduces  the  amount  of  memory  requirement  by  up  to  two  order 
of  magnitude.  We  note  that  our  protocol  scales  even  better  on  larger  size  networks  [3]. 

In  all  the  viewserver  schemes  we  studied,  each  switch  is  a  viewserver.  In  practice,  not  all 
switches  need  to  be  viewservers.  We  may  associate  one  viewserver  with  a  group  of  switches;  This  is 
particularly  attractive  in  ATM  networks  where  each  signaling  entity  is  responsible  for  establishing 
VCs  across  a  group  of  nodes.  In  such  an  environment,  viewservers  and  signaling  entities  can  be 


CARRIED  VC  LOAD  vs  Arrival  me 


Figure  10:  Carried  VC  load  versus  arrival  rate  for  Network  2. 


PERCENT  VC  BLOCKING  vs  Arrival  me 


combined.  | 

However,  there  is  an  adv'antage  of  each  switch  being  a  viewserver;  that  is,  source  nodes  do  not  '  \ 

require  fixed  source  routes  to  their  parent  viewservers  (in  the  view-query  protocol).  This  reduces  | 

the  amount  of  hand  configuration  required.  In  fact,  the  base  and  i>ase- QT  viewserver  schemes  do  / 

not  require  any  hand  configuration.  j 

Our  evaluation  model  assumed  that  view^s  are  instantaneously  updated,  i.e.  no  delayed  feedback  u 


I 


between  link  cost  changes  and  view/route  changes.  We  plan  to  investigate  the  effect  of  delayed  feed¬ 
back  on  the  performance  of  the  different  schemes.  We  expect  our  viewserver  schemes  to  outperform 
the  simple  approach  in  this  realistic  setting  as  the  update  of  views  of  the  viewservers  requires  less 
time  and  communication  overhead.  Thus,  views  in  our  viewserver  schemes  will  be  more  up-to-date. 

As  we  pointed  out  in  [3],  the  only  drawback  of  our  protocol  is  that  to  obtain  a  source  route 
for  a  VC,  views  are  merged  at  (or  prior  to)  the  VC  setup,  thereby  increasing  the  setup  time.  This 
drawback  is  not  unique  to  our  scheme  [8,  16,  7,  11],  Reference  [3]  describes  several  ways,  including 
cacheing  and  replication,  to  reduce  the  setup  overhead  and  improve  performance. 
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Abstract 

TraditionaJ  inter-domain  routing  protocols  based  on  superdomains  maintain  either  “strong” 
or  Sveak”  ToS  and  policy  constraints  for  each  visible  superdomain.  With  strong  constraints, 
a  valid  path  may  not  be  found  even  though  one  exists.  With  weak  constraints,  an  invalid 
domain-level  path  may  be  treated  as  a  valid  path. 

We  present  an  inter-domain  routing  protocol  based  on  superdomains,  which  always  finds 
a  valid  path  if  one  exists.  Both  strong  and  weak  constraints  are  maintained  for  each  visible 
superdomain.  If  the  strong  constraints  of  the  superdomains  on  a  path  are  satisfied,  then  the 
path  is  valid.  If  only  the  weak  constraints  are  satisfied  for  some  superdomains  on  the  path,  the 
source  uses  a  query  protocol  to  obtain  a  more  detailed  “internal”  view  of  these  superdomains, 
and  searches  again  for  a  valid  path.  Our  protocol  handles  topology  changes,  including  node/link 
failures  that  partition  superdomains.  Evaluation  results  indicate  our  protocol  scales  well  to  large 
internetworks. 


Categories  and  Subject  Descriptors:  C.2.1  [Computer-Communication  Networks):  Network  Arcbi- 
tecture  and  Design— packet  networks;  store  and  forward  networks;  C.2.2  [Computer-Communication  Net¬ 
works):  Network  Tiotocoh— protocol  archiiecivre;  C.2.m  [Routing  Protocols);  F.2.m  [Computer  Network 
Routing  Protocols], 
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type-of-service  (ToS)  constraints  of  applications  (e.g.  low  delay,  high  throughput,  high  reliabihty, 
minimum  monetary  cost),  each  node  maintains  a  cost  for  each  outgoing  link  and  ToS.  The  intra¬ 
domain  routing  protocol  should  choose  optimal  paths  based  on  these  costs. 

Across  all  domains,  an  inter-domain  routing  protocol  is  executed  that  provides  routes  between 
source  and  destination  nodes  in  different  domains,  using  the  services  of  the  intra-domain  routing 
protocols  within  domains.  This  protocol  should  have  the  following  properties: 

(1)  It  should  satisfy  the  policy  constraints  of  domains.  To  do  this,  it  must  keep  track  of  the 
policy  constraints  of  domains  [5]. 

(2)  An  inter-domain  routing  protocol  should  also  satisfy  ToS  constraints  of  applications.  To  do 
this,  it  must  keep  track  of  the  ToS  services  offered  by  domains  [5]. 

(3)  An  inter-domain  routing  protocol  should  scale  up  to  very  large  internetworks,  i.e.  with  a  very 
large  number  of  domains.  Practically  this  means  that  processing,  memory  and  communication 
requirements  should  be  much  less  than  linear  in  the  number  of  domains.  It  should  also 
handle  non-hierarchical  domain  interconnections  at  any  level  [8]  (e.g.  we  do  not  want  to 
hand-conligure  special  routes  as  “back-doors”). 

(4)  An  inter-domain  routing  protocol  should  automatically  adapt  to  link  cost  changes  and  node/link 
failures  and  repairs,  including  failures  that  partition  domains  [13]. 

A  Straight-Forward  Approach 

A  straight-forward  approach  to  inter-domain  routing  is  domain-level  source  routing  with  link-state 
approach  [7,  5].  In  this  approach,  each  router^  maintains  a  domain-level  view  of  the  internetwork, 
i.e.,  a  graph  with  a  vertex  for  every  domain  and  an  edge  between  every  two  neighbor  domains. 
Policy  and  ToS  information  is  attached  to  the  vertices  and  the  edges  of  the  view. 

When  a  source  node  needs  to  reach  a  destination  node,  it  (or  a  router^  in  the  source’s  domain) 
first  examines  this  view  and  determines  a  domain-level  source  route  satisfying  ToS  and  policv 
constraints,  i.e.,  a  sequence  of  domain  ids  starting  from  the  source’s  domain  and  ending  with  the 
destination  s  domain.  Then  packets  are  routed  to  the  destination  using  this  domain-level  source 
route  and  the  intra-domain  routing  protocols  of  the  domains  crossed. 

For  example,  consider  the  internetwork  of  Figure  2  (each  circle  is  a  domain,  and  each  thin  line 

®  Not  a]]  nodes  maintain  routing  tables.  A  router  is  a  node  that  maintains  a  routing  table, 
referred  to  as  the  policy  server  in  [7] 


1  Introduction 


A  computer  internetwork,  such  as  the  Internet,  is  an  interconnection  of  backbone  networks,  regional 
networks,  metropolitan  area  networks,  and  stub  networks  (campus  networks,  office  networks  and 
other  small  networks)^  Stub  networks  are  the  producers  and  consumers  of  the  internetwork  traffic, 
while  backbones,  regionaJs  and  MANs  are  transit  networks.  Most  of  the  networks  in  an  internetwork 
are  stub  networks.  Each  network  consists  of  nodes  (hosts,  routers)  and  links.  A  node  that  has  a 
link  to  a  node  in  another  network  is  called  a  gateway.  Two  networks  are  neighbors  when  there  is 
one  or  more  links  between  gateways  in  the  two  networks  (see  Figure  1). 


Figure  1:  A  portion  of  an  internetwork.  (Circles  represent  stub  networks.) 

An  internetwork  is  organized  into  domain^.  A  domain  is  a  set  of  networks  (possibly  consisting 
of  only  one  network)  administered  by  the  same  agency.  Domains  are  typically  subject  to  policy 
constraints,  w’hich  are  administrative  restrictions  on  inter-domain  traffic  [7,  11,  8,  5].  The  poBcy 
constraints  of  a  domain  U  are  of  two  types:  transit  policies,  which  specify  how  other  domains 
can  use  the  resources  of  U  (e.g.  SO.Ol  per  packet,  no  traffic  from  domain  V);  and  source  policies, 
which  specify  constraints  on  traffic  originating  from  U  (e.g.  domains  to  avoid/prefer,  acceptable 
connection  cost).  Transit  policies  of  a  domain  are  public  (i.e.  available  to  other  domains),  whereas 
source  policies  are  usually  private. 

Within  each  domain,  an  intra-domain  routing  protocol  is  executed  that  provides  routes  between 
source  and  destination  nodes  in  the  domain.  This  protocol  can  be  any  of  the  tj'pical  ones,  i.e., 
next-hop  or  source  routes  computed  using  distance- vector  or  link-state  algorithms.  To  satisfy 

j  For  example,  NSFNET,  MILNET  are  backbones,  and  Suranet,  CerfNet  are  regionals. 

Also  referred  to  as  routing  domains  or  administrative  domains. 


is  a  domain-level  interconnection).  Suppose  a  node  in  dl  desires  a  connection  to' a  node  in  dl. 
Suppose  the  policy  constraints  of  dZ  and  dl9  do  not  allow  transit  traffic  originating  from  dl.  Every 
node  maintains  this  information  in  its  view.  Thus  the  source  node  can  choose  a  valid  path  from 
source  domain  dl  to  destination  domain  d7  avoiding  dZ  and  dl9  (e.g.  thick  line  in  the  figure). 


Figure  2:  An  example  interdomadn  topology. 

The  disadvantage  of  this  straightforward  scheme  is  that  it  does  not  scale  up  for  large  internet¬ 
works.  The  storage  at  each  router  is  proportional  to  NdxEd,  where  Nd  is  the  number  of  domains 
and  Ed  is  the  average  number  of  neighbor  domains  to  a  domain.  The  communication  cost  for 
updating  views  is  proportional  to  x  Er,  where  Nr  is  the  number  of  routers  in  the  internetwork 

and  Er  is  the  average  router  neighbors  of  a  router  (topology  changes  axe  flooded  to  all  routers  in 
the  internetwork). 

The  Superdomain  Approach 

To  achieve  scaling,  several  approaches  based  on  hierarchically  aggregating  domains  into  superdo- 
matns  have  been  proposed  [16,  14,  6].  Here,  each  domain  is  a  level  1  superdomain,  “dose”  level  1 
superdomains  are  grouped  into  level  2  superdomains,  “close”  level  2  superdomains  are  grouped  into 
level  3  superdomains,  and  so  on  (see  Figure  3).  Each  router  x  maintains  a  view  that  contains  the 
level  1  superdomains  in  I’s  level  2  superdomain,  the  level  2  superdomains  in  I’s  level  3  superdomain 
(excluding  the  x  s  level  2  superdomain),  and  so  on.  Thus  a  router  maintains  a  smaller  view  than 
jt  would  in  the  absence  of  hierarchy.  For  the  superdomain  hierarchy  of  Figure  3,  the  views  of  two 
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Figure  3:  An  example  of  superdomain  hierarchy, 
routers  (one  in  domain  dl  and  one  in  domain  <il6)  are  shown  in  Figures  4  and  5. 


Figure  4;  View  of  a  router  in  dl.  Figure  5:  View  of  a  router  in  dl6. 

The  superdomain  approach  has  several  problems.  One  problem  is  that  the  aggregation  results 
in  loss  of  dommn-level  ToS  and  policy  information.  A  superdomain  is  usually  characterized  by  a 
single  set  of  ToS  and  policy  constraints  derived  from  the  ToS  and  policy  constraints  of  the  domains 
in  It.  Routers  outside  the  superdomain  assume  that  this  set  of  constraints  applies  uniformly  to 
each  of  its  children  (and  by  recursion  to  each  domain  in  the  superdomain).  If  there  axe  domains 
with  difiefent  (possibly  contradictory)  constraints  in  a  superdomain,  then  there  is  no  good  way  of 
deriving  the  ToS  and  policy  constraints  of  the  superdomain. 

The  usual  technique  [16]  of  obtaining  ToS  and  policy  constraints  of  a  superdomain  is  to  obtain 
either  a  strong  set  of  constraints  or  a  weak  set  of  constraints®  from  the  ToS  and  policy  constraints  of 

■■stroiig”  a.nd  “weah”  are  referred  to  respectively  as  “union*  and  “intersection*  in  [16] 


the  children  superdomains  in  it.  If  strong  (weak)  constraints  are  used  for  pohcies,  the  superdomain 
enforces  a  policy  constraint  if  that  policy  constrsdnt  is  enforced  by  some  (all)  of  its  children.  If 
strong  (weak)  constraints  are  used  for  ToS  constraints,  the  superdomain  is  assumed  to  support  a 
ToS  if  that  ToS  is  supported  by  aU  (some)  of  its  children.  The  intention  is  that  if  strong  (weak) 
constraints  of  a  superdomain  are  (are  not)  satisfied  then  any  (no)  path  through  that  superdomain 
is  valid. 

Each  approach  has  problems.  Strong  constraints  can  eliminate  valid  paths,  and  weak  constraints 
can  allow  invalid  paths.  For  example  in  Figure  3,  dl6  allows  transit  traffic  from  d\  while  dl9  does 
not;  with  strong  constraints  G  would  not  allow  transit  traffic  from  dl,  and  with  weak  constraints 
G  would  allow  transit  traffic  from  dl  to  be  routed  via  dl9. 

Other  problems  of  the  superdomain  approach  are  that  the  varying  visibilities  of  routers  compli¬ 
cates  superdomain-level  source  routing  and  handling  of  node/link  failures  (especially  those  that  par¬ 
tition  superdomains).  The  usual  technique  for  solving  these  problems  is  to  augment  superdomain- 
level  views  with  gateways  [16]  (see  Section  3). 

Our  Contribution 

In  this  paper,  we  present  an  inter-domain  routing  protocol  based  on  superdomains,  which  finds 
a  valid  path  if  and  only  if  one  exists.  Both  strong  and  weak  constraints  are  maintained  for  each 
visible  superdomain.  If  the  strong  constraints  of  the  superdomains  on  a  path  are  satisfied,  then 
the  path  is  vahd.  If  only  the  weak  constraints  are  satisfied  for  some  superdomains  on  the  path,  the 
source  uses  a  query  protocol  to  obtain  a  more  detailed  ‘‘internal"  view  of  these  superdomains.  and 
searches  again  for  a  valid  path. 

We  use  supeidomain-level  views  with  gateways  and  a  link-state  view  update  protocol  to  handle 
topology  changes  including  failures  that  partition  superdomains.  The  storage  cost  is  0(logJV£)  x 
logA'o)  without  the  query  protocol.  We  demonstrate  the  scaling  properties  of  the  query  protocol 
by  giving  evaluation  results  based  on  simulations.  Our  evaluation  results  indicate  that  the  query 
protocol  can  be  performed  using  ‘15%  extra  space. 

Our  protocol  consists  of  two  subprotocols:  a  view-query  protocol  for  obtaining  views  of 
greater  resolution  when  needed;  and  a  view-update  protocol  for  disseminating  topology  changes 
to  the  views. 


Several  approaches  to  scalable  inter-domain  routing  have  been  proposed,  based  on  the  super- 
domain  hierarchy  [1, 14,  16,  9,  6],  and  the  landmark  hierarchy  [18,  17].  Some  of  these  approaches 
suffer  from  loss  of  ToS  and  policy  information  (and  hence  may  not  find  a  valid  path  which  exists). 
Others  are  still  in  a  preliminary  stage.  (Details  in  Section  8.) 

One  important  difference  between  these  approaches  and  ours  is  that  ours  uses  a  query  mechanism 
to  obtain  ToS  and  policy  details  whenever  needed.  In  our  opinion,  such  a  mechanism  is  needed 
to  obtain  a  scalable  solution.  Query  protocols  are  also  being  developed  to  enhance  the  protocols 
in  [9,  6].  Reference  [2]  presents  protocols  based  on  a  new  kind  of  hierarchy,  referred  to  as  the 
viewserver  hierarchy  (more  details  in  Section  8). 

A  preliminary  version  of  the  view-query  protocol  was  proposed  in  reference  [1].  That  version 
differs  greatly  from  the  one  in  this  paper.  Here,  we  augment  superdomain-level  views  with  gate¬ 
ways.  In  [1],  we  augmented  superdomain-level  views  with  superdomain-to-domain  edges  (details  in 
Section  8).  Both  versions  have  the  same  time  and  space  complexity,  but  the  protocols  in  this  paper 
are  much  simpler  conceptually.  Also  the  view-update  protocol  is  not  in  reference  [1]. 

Organization  of  the  paper 

In  Section  2,  we  present  some  definitions  used  in  this  paper.  In  Section  3,  we  define  the  view  data 
structures.  In  Section  4,  we  describe  how  views  are  affected  by  topology  changes.  In  Section  5,  we 
present  the  view-query  protocol.  In  Section  6,  we  present  the  view-update  protocol.  In  Section  7 , 
we  present  our  evaluation  model  and  the  results  of  its  application  to  the  superdomain  hierarchy. 
In  Section  8,  we  survey  recent  approaches  to  inter-domain  routing.  In  Section  9,  we  conclude  and 
describe  cacheing  and  heuristic  schemes  to  improve  performance. 

2  Preliminaries 

Each  domain  has  a  unique  id.  Let  Domainids  denote  the  set  of  domain-ids.  Each  node  has  a 
unique  id.  Let  Nodelds  denote  the  set  of  node-ids.  For  a  node  i,  we  use  doinainid(i)  to  denote 
the  domain-id  of  z’s  domain. 

The  superdomain  hierarchy  defines  the  following  parent-child  relationship:  a  level  i,  i  >  1, 
superdomain  is  the  parent  of  each  level  i  -  1  superdomain  it  contains.  Top-level  superdomains 


have  no  parents.  Level  1  superdomains,  which  are  just  domains,  have  no  children.  For  any  two 
superdomains  Z  and  F,  X  is  a  sibling  of  F  iff  X  and  F  have  the  same  parent.  Z  is  an  ancestor 
(descendant)  of  F  iff  A'  =  F  or  A  is  an  ancestor  (descendant)  of  F’s  parent  (child). 

Each  router  maintains  information  about  a  subset  of  superdomains,  referred  to  as  its  visible 
superdomains.  The  visible  superdomains  of  a  router  r  are  (1)  r’s  domain  itself,  (2)  siblings  of  r’s 
domain,  and  (3)  siblings  of  ancestors  of  x>s  domain.  In  Figure  3,  the  visible  superdomains  of  a 
router  in  dl  are  dl,  d2,  dZ,B,C,G,J  (these  are  shown  in  Figure  4).  Note  that  if  a  superdomain  U 
IS  visible  to  a  router,  then  no  ancestor  or  descendant  of  U  is  visible  to  the  router. 

Each  superdomain  has  a  unique  id,  i.e.  unique  among  all  superdomains  regardless  of  level.  Let 
SuperDomainlds  denote  the  set  of  super  domain-ids.  Domainids  is  a  subset  of  SuperDomainlds. 
For  a  superdomain  U,  let  level(C^)  denote  the  level  of  V  in  the  hierarchy,  let  Ancestors(Cr)  denote 
the  set  of  ids  of  ancestor  superdomains  of  U  in  the  hierarchy,  and  let  Children(i7)  denote  the  set 
of  ids  of  child  superdomains  of  U  in  the  hierarchy. 

For  a  router  i,  let  VisibleSuperDoinains(i)  denote  the  set  of  ids  of  superdomains  visible  from 

a:. 

■V\'e  extend  the  above  definitions  by  allowing  their  arguments  to  be  nodes,  in  which  case  the  node 
stands  for  its  domain.  For  example,  if  r  is  a  node  in  domain  d,  Ancestors(a:)  denotes  Ancestors(d). 

3  Superdomain-Level  Views  with  Gateways 

For  routing  purposes,  each  domain  (and  node)  has  an  address,  defined  as  the  concatenation  of  the 
superdomam  ids  starting  from  the  top  level  and  going  down  to  the  domain  (node).  For  example  in 
Figure  3,  the  address  of  domain  dlo  is  G.E.dlh,  and  the  address  of  a  node  h  in  dl5  is  G.E.dlb.h. 

When  a  source  node  needs  to  reach  a  destination  node,  it  first  determines  the  visible  superdo¬ 
main  in  the  destination  address  and  then  by  examining  its  view  determines  a  superdomain-level 
source  route  (satisfying  ToS  and  policy  constraints)  to  this  superdomain.  However,  since  routers 
in  different  superdomains  maintain  views  of  different  sets  of  superdomains,  this  superdomain-ievel 
source  route  can  be  meaningless  at  some  intermediate  superdomain’s  router  i  because  the  next 
superdomain  in  this  source  route  is  not  visible  to  a:.  For  example  in  Figure  4,  superdomain-level 

source  route  {d2,B,G,C)  created  at  a  router  in  d2  becomes  meaningless  once  the  packet  is  in  G, 
where  C  is  not  visible. 


The  usual  technique  of  solving  this  problem  is  to  augment  superdomain-level  views  with  gate¬ 
ways  and  edges  between  these  gateways. 

Define  the  pair  U:g  to  be  an  sd-gateway  iff  17  is  a  superdomain  and  5  is  a  node  that  is  in  U  and 
has  a  link  to  a  node  outside  V.  Equivalently,  we  say  that  p  is  c  gateway  ofU. 

Define  {U:g,h)  to  be  an  actual-edge  iff  17:5  is  an  sd-gateway,  is  a  gateway  not  in  V,  and  there 
is  a  link  from  g  to  h. 

Define  {U:g,h)  to  be  a  virtual-edge  iff  U'.g  and  U:h  are  sd-gateways  and  g  ^  h  (note  that  there 
may  not  be  a  link  between  g  and  h). 

{U:g,h)  is  an  edge  iff  it  is  an  actual-edge  or  a  virtual-edge.  An  edge  {U:g,h)  is  also  said  to  be 
an  outgoing  edge  of  Uig.  Define  edges  of  U :g  to  be  the  set  of  edges  outgoing  from  U :g.  Define  edges 
of  U  to  be  the  set  of  edges  outgoing  from  any  gateway  of  17. 

Let  Gateways(t/')  denote  the  set  of  node-ids  of  gateways  of  U.  Let  Edges(t7:5)  denote  the  edges 
of  U:g.  Note  that  we  never  use  “edge’’  as  a  synonym  for  link. 

A  gateway  p  of  a  domain  can  generate  many  sd-gateways,  specifically,  U :g  for  every  ancestor  U 
of  p’s  domain  such  that  p  has  a  link  to  a  node  outside  U .  A  link  (p,h)  where  p  and  h  are  gateways 
in  different  domains,  can  generate  many  actual-edges;  specifically,  actual-edge  {U:g,h)  for  every 
ancestor  U  of  p’s  domain  such  that  U  is  not  an  ancestor  of  h’s  domain. 

For  the  internetwork  topology  of  Figure  2,  the  corresponding  gateway-level  connections  are 
showm  in  Figure  6  where  black  rectangles  are  gateways.  For  the  hierarchy  of  Figure  3,  gateway 
p  in  Figure  6  generates  sd-gateways  dl6:p,  E:g,  and  G:g.  The  link  {g,h)  in  Figure  6  generates 
actual-edges  (dl6:p,/i),  (F;;p,  h),  {G:g,h). 

To  a  router,  at  most  one  of  the  sd-gateways  generated  by  a  gateway  p  is  visible,  namely  U :p 
where  U  is  an  ancestor  of  p’s  domain  and  U  is  visible  to  the  router.  At  most  one  of  the  actual-edges 
generated  by  aBnk  (p,  h)  between  two  gateways  in  different  domains  is  visible  to  the  router,  namely 
edge  {U:g,h)  where  U:g  is  visible  to  the  router.  None  of  the  actual-edges  are  visible  to  the  router 
if  p  and  h  are  inside  a  visible  superdomain.  For  example  in  Figure  3,  of  the  actual-edges  generated 
by  link  (p,h),  only  (G:p,h)  is  visible  to  a  router  in  dl,  and  only  {dl6:g,h)  is  visible  to  a  router  in 

die. 

A  router  maintains  a  view  consisting  of  the  visible  sd-gateways  and  their  outgoing  actual-  zmd 
virtual-edges.  An  edge  {Uig.h)  in  the  view  of  a  router  connects  the  sd-gateway  U:g  to  the  sd- 


Figure  6:  Gateway-level  connections  of  internetwork  of  Figure  2. 

gateway  V:h  such  that  V:h  is  visible  to  the  router.  For  the  superdomain-level  views  of  Figures  4 
and  5,  the  new  views  are  shown  in  Figures  7  and  8,  respectively. 

gateway  G:g 


Figure  7:  View  of  a  router  in  dl.  Figure  8:  View  of  a  router  in  dl6. 

The  view  of  a  router  z  contains,  for  each  superdomain  U  that  is  visible  to  x  or  is  an  ancestor 
of  X,  the  strong  and  weak  constraints  of  U  and  a  set  referred  to  as  Gateways&Edges^{U).  This 
set  contains,  for  each  gateway  y  of  U,  the  edges  of  U:y  and  their  costs.  The  reason  for  storing 
information  about  ancestor  superdomains  is  given  in  Section  5.  The  cost  field  is  used  to  satisfy  ToS 
constraints  and  is  described  in  Section  4.  The  timestamp  field  is  described  in  Section  6.  Formally, 
the  view  of  x  is  defined  as  follows: 


V iewj:.  View  of  x. 

=  {{U,  strong_constraints([/),  weai_constraints(i7),  Gateways&Edges^{U))  : 

U  €  VisibleSuperDomain.s(i)  U  Ancestors(i)  } 

where 

Gaieways&Edges^{U).  Sd-gateways  and  edges  of  U. 

=  {{y,  timestamp,  {{z,  cost)  :  {U:y,z)  e  Edges(i7:y)})  :  y  €  Gateways(l/)  }. 

ToS  and  policy  constraints  can  also  be  specified  for  each  sd-gateway  and  edge.  Our  protocols 
can  be  extended  to  handle  such  constraints,  but  we  have  not  done  so  here  in  order  to  keep  their 
descriptions  simple. 

A  superdomain-level  source  route  is  now  a  sequence  of  sd-gateway  ids.  With  this  definition,  it 
is  easy  to  verify  that  whenever  the  next  superdomain  in  a  superdomain-level  source  route  is  not 
visible  to  a  router,  there  is  an  actual-edge  (hence  a  link)  between  the  router  and  the  next  gateway 
in  this  route. 

4  Edge-Costs  and  Topology  Changes 

A  cost  is  associated  with  each  edge.  The  cost  of  an  edge  equals  a  vector  of  values  if  the  edge  is  up; 
each  cost  value  indicates  how  expensive  it  is  to  cross  the  edge  according  to  some  ToS  constraint. 
The  cost  equals  oo  if  the  edge  is  an  actual-edge  and  it  is  down,  or  the  edge  is  a  virtual-edge  {U:g,  h) 
and  h  can  not  be  reached  from  g  without  leaving 

Since  an  actual- edge  represents  a  physical  link,  its  cost  can  be  determined  from  measured  huV 
statistics.  The  cost  of  a  virtual-edge  {Uig^h)  is  an  aggregation  of  the  cost  of  physical  links  in 
U  and  is  calculated  as  follows:  If  C/  is  a  domain,  the  cost  of  (Uig^h)  is  calculated  as  the  maxi- 
mum /minimum /average  cost  of  the  routes  within  U  from  g  to  h  [4].  For  higher  level  superdomains 
J7,  the  cost  of  {Uig^h)  is  derived  from  the  costs  of  edges  between  the  gateways  of  children  super- 
domains  of  Z7. 

Link  cost  changes  and  link/node  failures  and  repairs  correspond  to  cost  changes,  failures  and 
repairs  of  actual-  and  virtual-edges.  Thus  the  attributes  of  edges  in  the  views  of  routers  must  be 
regularly  updated.  For  this,  we  employ  a  view-update  protocol  (see  Section  6). 


Link/node  failures  can  also  partition  a  superdomain  into  cells,  where  a  cell  of  a  superdomain 
is  defined  to  be  a  maximal  subset  of  nodes  of  the  superdomain  that  can  reach  each  other  without 
leaving  the  superdomain.  Superdomain  partitions  can  occur  at  any  level  in  the  hierarchy.  For 
example,  suppose  [7  is  a  domain  and  V  is  its  parent  superdomain.  U  can  be  partitioned  into  cells 
without  V  being  partitioned  (i.e.  if  the  cells  of  U  can  reach  each  other  without  leaving  V).  The 
opposite  can  also  happen:  if  all  links  between  U  and  the  other  children  of  V  fail,  then  V  becomes 
partitioned  but  U  does  not.  Or  both  U  and  V  can  be  partitioned.  In  the  same  way,  link/node 
repairs  can  merge  cells  into  bigger  cells. 

We  handle  superdomai.n  partitioning  as  follows:  A  router  detects  that  a  super  domain  U  is 
partitioned  when  a  virtual-edge  of  U  in  the  router’s  view  has  cost  oo.  W^hen  a  router  forwards 
a  packet  to  a  destination  for  which  the  visible  superdomain,  say  U,  in  the  destination  address  is 
partitioned  into  cells,  a  copy  of  the  packet  is  sent  to  each  cell  by  sending  a  copy  of  the  packet  to 
each  gateway  of  C7;  the  id  U  in  the  destination  address  is  “marked”  in  the  packet  so  that  subsequent 
routers  do  not  create  new  copies  of  the  packet  for  U. 

5  View- Query  Protocol 

W'hen  a  source  node  wants  a  superdomain-level  source  route  to  a  destination,  a  router  in  its  domain 
examines  its  view  and  searches  for  a  valid  path  (i.e.  superdomain-level  source  route)  using  the 
destination  address®.  We  refer  to  this  router  as  the  source  router.  Even  though  the  source  router 
does  not  know  the  constraints  of  the  individual  domains  that  are  to  be  crossed  in  each  superdomain, 
it  does  know  the  strong  and  weak  constraints  of  the  superdomains.  We  refer  to  a  superdomain 
whose  strong  constraints  are  satisfied  as  a  valid  superdomain.  If  a  superdomain’s  weak  constraints 
are  satisfied  but  strong  constraints  are  not  satisfied,  then  there  may  be  a  valid  path  through  this 
superdomain.  We  refer  to  such  a  superdomain  as  a  candidate  superdomain. 

A  path  is  valid  if  it  involves  only  valid  superdomains.  A  path  cannot  be  valid  if  it  involves 
a  superdomain  which  is  neither  valid  nor  candidate.  We  refer  to  a  path  involving  only  valid  and 
candidate  superdomains  as  a  candidate  path. 

®  We  assume  that  the  source  has  the  destination’s  address.  If  that  is  not  the  case,  it  would  first  query  the  name 
servers  to  obtain  the  address  for  the  destination.  Querying  the  name  servers  can  be  done  the  same  way  it  is  done 

currently  in  the  Internet.  It  requires  nodes  to  have  a  set  of  fixed  addresses  to  name  servers.  This  is  also  sufficient  in 
our  case. 


If  the  source  router’s  view  contains  a  candidate  path  {Uo'-goo ,  •  •  • ,  Uo'.go^ ,  :pio ,  •  •  • ,  Ui  ,  •  •  •  , 
U-m-gmoi-  ■  -lUm'g-mnm)  ^0  the  destination  (and  does  not  contain  a  valid  path),  then  for  each  candi¬ 
date  superdoniain  U{  on  this  path,  the  source  router  queries  gateway  of  Ui  for  the  internal  view  of 
Ui.  This  internal  view  consists  of  the  constraints,  sd-gateways  and  edges  of  the  child  superdomains 
of  Ui. 

When  a  router  x  receives  a  request  for  the  internal  view  of  an  ancestor  superdomain  U,  it 
returns  the  following  data  structure: 

IViewx{U).  Internal  view  of  U  at  router  x. 

=  strong.constraints(y),  veak_constraints(y),  Gateways&Edges^{V))  e  View^  : 

V  €  Children(i7)} 

It  is  to  simplify  the  construction  of  IViewx{U)  that  we  store  information  about  ancestor  su¬ 
perdomains  in  the  view  of  router  x.  Instead  of  storing  this  information,  router  x  could  construct 
IViewxiU)  from  the  constraints,  sd-gateways  and  edges  of  the  visible  descendants  of  U.  We  did 
not  choose  this  alternative  because  the  extra  information  does  not  increase  storage  complexitv. 

W'hen  the  source  router  receives  the  internal  view  of  a  superdomain  D\  it  does  the  following: 
(1)  it  removes  the  sd-gatew'ays  and  edges  of  U  from  its  view*;  (2)  it  adds  the  sd-gateways  and  edges 
of  children  superdomains  in  the  interna!  view  of  U;  and  (3)  searches  for  a  valid  path  again.  If  there 
is  still  no  valid  path  but  there  are  candidate  paths,  the  process  is  repeated. 

For  example,  consider  Figure  3.  For  a  router  in  super  domain  dl  (see  Figure  7),  C  is  visible  and 
is  a  candidate  domain.  The  internal  view  of  G  is  shown  in  Figure  9,  and  the  resulting  merged  view 
is  shown  in  Figure  10.  The  valid  path  through  G  (visiting  die  and  avoiding  dl9)  can  be  discovered 
using  this  merged  view  (since  the  strong  constraints  of  E  are  satisfied). 

Consider  a  candidate  route  to  a  destination:  {Uo'-goo ,  •  -  - ,  Uoigo,^ ,  :5io ,  •  •  - ,  •  , 

Um-gmoi’  •  •  lUm'gmnm)-  H  superdomain  Ui  is  partitioned  into  cells,  it  may  re-appear  later  in  the 
candidate  path  (i.e.  for  some  j  z,  Uj  =  Ui).  In  this  case  both  gateways  and  gj^  are  queried. 
Timestamps  are  used  to  resolve  conflicts  between  the  information  reported  by  these  gateways. 

The  view-query  protocol  uses  two  types  of. messages  as  follows: 

•  (Request IVieu,  sdid,  gid,  sjaddress,  djaddress) 


Figure  9:  Internal  view  of  C.  Figure  10:  Merged  view  at  dl. 

Sent  by  a  source  router  to  gateway  gid  to  obtain  the  internal  view  of  superdomain  sdid. 

sjiddress  is  the  address  of  the  source  router.  d.address  is  the  address  of  the  destination 
node  (of  the  desired  route). 

•  (ReplylView,  sdid^  gid,  iview,  djo^ddress) 

where  iview  is  the  internal  view  of  superdomain  sdid,  and  other  parameters  are  as  in  the 
RequestIView  message.  It  is  sent  by  gateway  gid  to  the  source  router. 

The  state  maintained  by  a  source  router  x  is  listed  in  Figure  15.  Pending Req,,  is  used  to 
avoid  sending  new  request  messages  before  receiving  all  outstanding  reply  messages.  and 

PendingReqx  are  allocated  and  deallocated  on  demand  for  each  destination. 

The  events  of  router  i  are  specified  in  Figure  15.  In  the  figure,  *  is  a  wild-card  matching  any 
value.  TtmeOutx  event  is  executed  after  a  time-out  period  from  the  execution  of  RequesU  event  to 

indicate  that  the  request  has  not  been  satisfied.  The  source  host  can  then  repeat  the  same  request 
afterwards. 

The  procedure  seaTch^  uses  an  operation  “IleliableSend(m)  to  v~,  where  m  is  the  message  being 
sent  and  v  is  either  an  address  of  an  arbitrary  router  or  an  id  of  a  gateway  of  a  visible  superdomain. 
ReliableSend  is  asynchronous.  The  message  is  delivered  to  u  as  long  as  there  is  a  sequence  of  up 
links  between  u  and  vJ  (Note  that  an  address  is  not  needed  to  obtain  an  inter-domain  route  to  a 
gateway  of  a  visible  superdomain.) 

Router  Failure  Model:  A  router  can  undergo  failures  and  recoveries  at  anytime.  We 
assume  failures  are  fail-stop  (i.e.  a  failed  router  does  not  send  erroneous  messages).  When  a  router 
X  recovers,  the  variables  WViewx  and  PendingReq^  are  lost  for  all  destinations.  The  cost  of  each 
edge  in  View^  is  set  to  oo.  It  becomes  up-to-date  as  the  router  receives  new  information  from  other 

This  involves  time-outs,  retransmissions,  etc.  It  requires  a  transport  protocol  support  such  as  TCP. 


routers. 


6  View-Update  Protocol 

A  gateway  for  each  ancestor  superdomain  informs  other  routers  of  topology  changes  (i.e. 
failures,  repairs  and  cost  changes)  affecting  Uig's  edges.  The  communication  is  done  by  flooding 
messages.  The  flooding  is  restricted  to  the  routers  in  the  parent  superdomain  of  U,  since  U  is 
visible  only  to  these  routers. 

Due  to  the  nature  of  flooding,  a  router  can  receive  information  out  of  order  from  a  gateway.  In 
order  to  avoid  old  information  replacing  new  information,  each  gateway  includes  increa^sing  time 
stamps  in  the  messages  it  sends.  Routers  maintain  for  each  gateway  the  highest  received  time 
stamp  (in  the  timestamp  field  in  View^)^  and  discard  messages  with  smaller  timestamps.  Time 
stamps  do  not  have  to  he  real-time  clock  values. 

Due  to  superdomain  partitioning,  messages  sent  by  a  gateway  may  not  reach  all  routers  within 
the  parent  superdomain,  resulting  in  some  routers  having  out-of-date  information.  This  out-of-date 
information  can  cause  inconsistencies  when  the  partition  is  repaired.  To  eliminate  inconsistencies, 
when  a  link  recovers,  the  two  routers  at  the  ends  of  the  link  exchange  their  views  and  flood  any  new 
information.  As  usual,  information  about  a  superdomain  TJ  is  flooded  over  C/’s  parent  superdomain. 

The  view-update  protocol  uses  messages  of  the  following  form: 

•  (Update,  sdid,  gid,  timestamp,  edge-set) 

Sent  by  the  gateway  gid  to  inform  other  routers  about  current  attributes  of  edges  of  sdidigid. 
timestamp  indicates  the  time  stamp  of  gid.  edge-set  contains  a  cost  for  each  edge. 

The  state  maintained  by  a  router  x  is  listed  in  Figure  16.  Note  that  AdjLocalRouters^  or 
AdjForeignGatevays^  can  be  empty.  JntraDomainRTz  contains  a  route  (next-hop  or  source)®  for 
every  reachable  node  of  the  domain.  Vsfe  assume  that  consecutive  reads  of  Clocks  returns  increasing 
\’alues. 

Routers  also  receive  and  flood  messages  containing  edges  of  sd-gateways  of  their  ancestor  su¬ 
perdomains.  This  information  is  used  by  the  query  protocol  (see  Section  5).  Also  the  highest 
timestamp  received  from  a  gateway  p  of  an  ancestor  superdomain  is  needed  to  avoid  exchanging 

IntraDoTnairiRTs  is  2.  view  in  c^e  of  a.  link-state  routing  protocol  or  a  distance  table  in  case  of  a  distance- vector 
routing  protocol 


the  messages  of  g  infinitely  during  flooding. 

The  events  of  router  x  are  specified  in  Figure  16.  We  use  Ancestor, (t/)  to  denote  the  superdomain- 
id  of  the  ith  ancestor  of  U,  where  Ancestoro(17)  =  t/.  In  the  view-update  protocol,  a  node  u  uses 
send  operations  of  the  form  “Send(m)  to  v”,  where  m  is  the  message  being  sent  and  v  is  the 
destination-id.  Here,  nodes  u  and  v  are  neighbors,  and  the  message  is  sent  over  the  physical  li-nV 
(u,  v).  If  the  link  is  down,  we  assume  that  the  packet  is  dropped. 

7  Evaluation 

In  the  superdomain  hierarchy  (without  the  query  protocol),  the  number  of  superdomains  in  a  view 
is  logarithmic  in  the  number  of  superdomains  in  the  internetwork  [10].^  However,  the  storage 
required  for  a  view  is  proportional  not  to  the  number  of  superdomains  in  it  but  to  the  number  of 
sd-gateways  in  it.  As  we  have  seen,  there  can  be  more  than  one  sd-gateway  for  a  superdomain  in 
a  view. 

In  fact,  the  superdomain  hierarchy  does  not  scale-up  for  arbitrary  internetworks;  that  is,  the 
number  of  sd-gateways  in  a  view  can  be  proportional  to  the  number  of  domains  in  the  internetwork. 
For  example,  if  each  domain  in  a  superdomain  U  has  a  distinct  gateway  with  a  link  to  outside  U, 
the  number  of  sd-gateways  of  V  would  be  linear  in  the  number  of  domains  in  C7. 

The  good  news  is  that  the  superdomain  hierarchy  does  scale-up  for  realistic  internetwork  topolo¬ 
gies.  A  sufficient  condition  for  scaling  is  that  each  superdomain  has  at  most  log  Nd  sd-gateways; 
this  condition  is  satisfied  by  realistic  internetworks  since  most  domain  interconnections  are  “hier¬ 
archical  connections”  i.e.  between  backbones  and  regionals,  between  regionals  and  MANs,  and  so 
on. 

In  this  section,  we  present  an  evaluation  of  the  scaling  properties  of  the  superdomain  hierarchy 
and  the  query  protocol.  To  evaluate  any  inter-domain  routing  protocol,  we  need  a  model  in  which 
we  can  define  internetwork  topologies,  policy /ToS  constraints,  inter-domain  routing  hierarchies, 
and  evaluation  measures  (e.g.  memory  and  time  requirements).  We  have  recently  developed  such 
a  model  [3].  We  first  describe  our  model,  and  then  use  it  to  evaluate  our  superdomain  hierarchy. 
Our  evaluation  measures  are  the  amount  of  memory  required  at  the  routers,  and  the  amount  of 

*  Even  though  the  results  in  [10]  were  for  intra-domain  routing,  it  is  easy  to  show  that  the  analysis  there  holds 
for  inter-domain  routing  as  well. 


time  needed  to  construct  a  path. 


7.1  Evaluation  Model 

We  first  describe  our  method  of  generating  topologies  and  policy /ToS  constraints.  We  then  describe 
the  evaluation  measures. 

Generating  Internetwork  Topologies 

For  our  purposes,  an  internetwork  topology  is  a  directed  graph  where  the  nodes  correspond  to 
domains  and  the  edges  correspond  to  domain-level  connections.  However,  an  arbitrary  graph  will 
not  do.  The  topolog}^  should  have  the  characteristics  of  a  real  internetwork,  like  the  Internet. 
That  is,  it  should  have  backbones,  regionals,  MANS,  LANS,  etc.;  there  should  be  hierarchical 
connections,  but  some  “non-hieraxchical”  connections  should  also  be  present. 

For  brevity,  we  refer  to  backbones  as  class  0  domains,  regionals  as  cla^s  1  domains,  metropolitan- 
area  domains  and  providers  as  class  2  domains,  and  campus  and  local- area  domains  as  class  3 
domains.  A  (strictly)  hierarchical  interconnection  of  domains  means  that  class  0  domains  are 
connected  to  each  other,  and  for  :  >  0,  class  i  domains  are  connected  to  class  i  —  1  domains. 
As  mentioned  above,  we  also  want  some  “non-hierarchical“  connections,  i.e.,  domain-level  edges 
between  domains  irrespective  of  their  classes  (e.g.  from  a  campus  domain  to  another  campus 
domain  or  to  a  backbone  domain). 

In  reality,  domains  span  geographical  regions  and  domain-level  edges  axe  often  between  do¬ 
mains  that  are  geographically  close  (e.g.  University  of  Maryland  campus  domain  is  connected  to 
SURANET  regional  domain  which  are  both  in  the  east  coast).  We  also  want  some  edges  that  are 
between  far  domains.  A  class  i  domain  usually  spans  a  larger  geographical  region  than  a  class  z  -f- 1 
domain.  To  generate  such  interconnections,  we  associate  a  “region”  attribute  to  each  domain.  The 
intention  is  that  two  domains  with  the  same  region  are  geographically  close. 

The  region  of  a  class  z  domain  has  the  form  ro-rj.  •  • -.Xi,  where  the  rj’s  are  integers.  For 
example,  the  region  of  a  class  3  domain  can  be  1.2. 3. 4.  For  brevity,  we  refer  to  the  region  of  a 
class  z  domain  2ls  a  class  z  region. 

Note  that  regions  have  their  own  hierarchy  which  should  not  be  confused  with  the  super  domain 
hierarchy.  CIslss  0  regions  are  the  top  leveV regions.  We  say  that  a  class  z’  region  xo-Xi.  •  .Xi-i.Xi 


is  contained iTL  the  class  z  —  1  region  ro.ri.  •  •  -.ri.i  (where  i  >  0).  Containment  is  transitive.  Thus 
region  1 . 2 . 3 . 4  is  contained  in  regions  1.2.3,  1.2  and  1. 


Figure  11:  Regions  , 

Given  any  pair  of  domains,  we  classify  them  as  local,  remote  or  far,  based  on  their  regions. 
Lst  .iY  be  a  class  i  domain  and  Y  a  class  j  domain,  and  without  loss  of  generality  let  i  j .  X 
^  loco/  if  they  are  in  the  same  class  i  region.  For  example  in  Figure  11,  A  is  local  to 
and  Q.  X  and  Y  are  remote  if  they  are  not  in  the  same  class  i  region  but 
they  are  in  the  same  class  i  -  1  region,  or  if  i  =  0.  For  example  in  Figure  11,  some  of  the  domains 
.4  is  remote  to  are  D,  E,  F ,  and  L.  X  and  Y  are  Jar  if  they  are  not  local  or  remote.  For  example 
in  Figure  11,  j4  is  far  to  J. 

We  refer  to  a  domain-level  edge  as  local  ( remote,  or  far)  jf  the  two  domains  it  connects  are  local 


(remote,  or  far). 

We  use  the  following  procedure  to  generate  internetwork  topologies: 

•  We  first  specify  the  number  of  domain  classes,  and  the  number  of  domains  in  each  class. 

•  We  next  specify  the  regions.  Note  that  the  number  of  region  classes  equals  the  number  of 
domain  classes.  We  specify  the  number  of  class  0  regions.  For  each  class  i  >  0,  we  specify  a 
branching  factor,  which  creates  that  many  class  i  regions  in  each  class  i  —  1  region.  (That  is, 
if  there  are  two  class  0  regions  and  the  class  1  branching  factor  equals  three,  then  there  are 
six  class  1  regions.) 

•  For  each  class  i,  we -randomly  map  the  class  i  domains  into  the  class  i  regions.  Note  that 
several  domains  can  be  mapped  to  the  same  region,  and  some  regions  may  have  no  domain 
mapped  into  them. 

•  For  every  class  i  and  every  class  j,  j  >  t,  we  spedfy  the  number  of  local,  remote  and  far 
edges  to  be  introduced  between  class  i  domains  and  class  j  domains.  The  end  points  of  the 
edges  are  chosen  randomly  (within  the  specified  constraints). 

•  We  ensure  that  the  internetwork  topology  is  connected  by  ensuring  that  the  subgraph  of  class 
0  domains  is  connected,  and  each  class  i  domain,  for  i  >  0,  is  connected  to  a  local  class  i  —  1 
domain. 

•  Each  domain  has  one  gateway.  So  all  neighbors  of  a  domain  are  connected  via  this  gateway. 
This  is  for  simphcity. 

Choosing  Policy /ToS  Constraints 

We  chose  a  simple  scheme  to  model  policy /ToS  constraints.  Each  domain  is  assigned  a  color:  green 
or  red.  For  each  domain  class,  we  specify  the  percentage  of  green  domains  in  that  class,  and  then 
randomly  choose  a  color  for  each  domain  in  that  class. 

A  valid  route  from  a  source  to  a  destination  is  one  that  does  not  visit  any  red  intermediate 
domains;  the  source  and  destination  domains  are  allowed  to  be  red. 

This  simple  scheme  can  model  many  realistic  policy/ToS  constraints,  such  as  security  constraints 
and  bandwidth  requirements.  It  cannot  model  some  important  kinds  of  constraints,  such  as  delay 
bounds. 


Computing  Evaluation  Measures 

The  evaJuation  measures  of  most  interest  for  an  inter-domain  routing  protocol  are  its  memory,  time 
and  communication  requirements.  We  postpone  the  precise  definitions  of  the  evaluation  measures 
to  the  next  subsection. 

The  only  analysis  method  we  have  at  present  is  to  numerically  compute  the  evaluation  measures 
for  a  variety  of  source-destination  pairs.  Because  we  use  internetwork  topologies  of  large  sizes,  it 
is  not  feasible  to  compute  for  all  possible  source- destination  pairs.  We  randomly  choose  a  set 
of  source-destination  pairs  that  satisfy  the  following  conditions:  (1)  the  source  and  destination 
domains  are  different  stub  domains,  and  (2)  there  exists  a  valid  path  from  the  source  domain  to  the 
destination  domain  in  the  internetwork  topology.  (Note  that  the  straight-forward  scheme  would 
always  find  such  a  path.) 

7.2  Application  to  Superdomain  Query  Protocol 

We  use  the  above  model  to  evaluate  our  superdomain  query  protocol  for  several  different  super¬ 
domain  hierarchies.  For  each  hierarchy,  we  define  a  set  of  superdomain-ids  and  a  parent-child 
relationship  on  them. 

The  first  superdomain  hierarchy  scheme  is  referred  to  as  child-domains.  Each  domain  d  (re¬ 
gardless  of  its  class)  is  a  level-1  superdomain,  also  identified  as  d.  In  addition,  for  each  backbone  d, 
we  create  a  distinct  level-4  superdomain  referred  to  as  d-4.  For  each  regional  d,  we  create  a  distinct 
level-3  superdomain  d-3  and  make  it  a  child  of  a  randomly  chosen  level-4  superdomain  e-4  such 
that  d  and  e  are  local  and  connected.  For  each  MAN  d,  we  create  a  distinct  level- 2  superdomain 
d-2  and  make  it  a  child  of  a  randomly  chosen  level-3  superdomain  e-3  such  that  d  and  e  are  local 
and  connected.  Please  see  Figure  12. 

We  next  describe  how  the  level- 1  superdomains  (i.e.  the  domains)  are  placed  in  the  hierarchy. 
A  backbone  d  is  placed  in,  i.e.  as  a  child  of,  d-4.  A  regional  d  is  placed  in  d-3.  A  MAN  d  is  placed 
in  d-2.  A  stub  d  is  placed  in  e-2  such  that  d  and  e  are  local  and  connected.  Please  see  Figure  12. 

The  second  superdomain  hierarchy  scheme  is  referred  to  as  sibling-domains.  It  is  identical 
to  child-domains  except  for  the  placement  of  level-1  superdomains  corresponding  to  backbones, 
regionals  and  MANs.  In  sibling- domains,  a  backbone  d  is  placed  as  a  sibhng  of  d-4.  A  regional  d 
is  placed  as  a  sibling  of  d-3.  A  MAN  d  is  placed  as  a  sibling  of  d-2.  Please  see  Figure  13. 


In  leaf-domains,  backbones  and  regionals  are  placed  in  some  level-2  superdomain,  as  follows.  A 
regional  d,  if  superdomain  d-Z  has  a  child  superdomain  e-2,  is  placed  in  e-2.  Otherwise,  a  new  level- 
2  superdomain  d-2  is  created  and  placed  in  d-Z.  d  is  placed  in  d-2.  A  backbone  d,  if  superdomain 
d-4  has  a  child  superdomain  f-Z,  is  placed  in  the  level-2  superdomain  containing  the  regional  /. 
Otherwise,  a  new  level-3  superdomain  d-Z  is  created  and  placed  in  d-4,  a  new  level-2  superdomain 
d-2  is  created  and  placed  in  d-Z.  d  is  placed  in  d-2.  Please  see  Figure  14. 

Note  that  in  leaf-domains,  all  level-1  superdomains  axe  placed  under  level-2  superdomains. 

Whereas  other  schemes  allow  some  level-1  superdomains  to  be  placed  under  higher  level  superdo- 
mains. 


Figure  14:  leaf- domains 


The  fourth  super  domain  hierarchy  scheme  is  referred  to  as  regions.  In  this  scheme,  the  super- 
domain  hierarchy  corresponds  exactly  to  the  region  hierarchy  used  to  generate  the  internetwork 
topolog}\  That  is,  for  a  class  1  region  x  there  is  a  distinct  level  5  (top  level)  superdomain  x-o.  For 
a  class  2  region  x.y  there  is  a  distinct  level  4  superdomain  x.y-4  placed  under  level  5  superdomain 
2-5,  and  so  on.  Each  domain  is  placed  under  the  superdomain  of  its  region.  Please  see  Figure  11. 


Results  for  Internetwork  1 


The  parameters  of  the  first  internetwork  topology,  referred  to  as  Internetwork  1,  are  shown  in 
Table  1. 


Class  i 

No.  of  Domains 

No.  of  Regions 

%  of  Green  Domains 

Edges  b 

Class  j 

etween  ( 

Local 

-  _ 

Classes  i  a 

Remote 

nd  j 

Far 

0 

10 

4 

0.80 

0 

8 

6 

0 

1 

lOO 

16 

0.75 

0 

190 

20 

0 

1 

26 

5 

0 

2 

1000 

64 

0.70 

0 

100 

0 

0 

1 

1060 

40 

0 

2 

200 

40 

0 

3 

10000 

256 

0.20 

0 

100 

0 

0 

1 

100 

0 

0 

2 

10100 

50 

0 

3 

50 

50 

50 

Table  1:  Parameters  of  Internetwork  1. 


Onr  evaluation  measures  were  compnied  for  a  (randomly  chosen  but  fixed)  set  of  100,000  source- 
destination  pairs.  For  a  source-destination  pair,  we  refer  to  the  length  of  the  shortest  valid  path  in 
the  internetwork  topology  as  the  shortesi-path  length,  or  spl  in  short.  The  minimum  spl  of  these 
pairs  was  2,  the  maximum  spl  was  15,  and  the  average  spl  was  6.84. 

For  each  source- destination  pair,  the  set  of  candidate  paths  is  examined  in  shortest-first  order 
untD  either  a  valid  path  was  found  or  the  set  was  exhausted  and  no  valid  paths  were  found. 
For  each  candidate  path,  RequestlView  messages  axe  sent  to  all  candidate  superdomains  on  this 
path  in  parallel.  AH  ReplylVieu  messages  are  received  in  time  proportional  to  the  round-trip 
time  to  the  farthest  of  these  superdomains.  Hence,  total  time  requirement  is  proportional  to  the 
number  of  candidate  paths  queried  multiplied  by  the  round-trip  time  to  the  farthest  superdomain 
in  these  paths.  Let  msgsize  denote  the  sum  of  average  RequestlVieu  message  size  and  average 

Branching  factor  is  4  for  all  region  classes. 


Scheme 

No  query  needed 

Candidate  Paths 

Candidate  Superdomsdns 

child-domains 

220 

3.31/13 

7.35/38 

sibling-domains 

220 

3/10 

6.17/22 

leaf-domains 

219 

6.31/24 

15.94/66 

regions 

544 

3.70/12 

7.79/30 

Table  2:  Queries  for  Internetwork  1. 

ReplylView  message  size.  The  number  of  candidate  superdomains  queried  times  msgsize  indicates 
the  communication  capacity  required  to  ship  the  RequestIVieu  and  ReplylViexj  messages. 

Table  2  lists  for  each  superdomain  scheme  the  average  and  maximum  number  of  candidate  paths 
and  candidate  superdomains  queried.  As  apparent  from  the  table,  sibling-domaiTis  is  superior  to 
other  schemes  and  leaf-domains  is  much  worse  than  the  rest.  This  is  because  in  leaf-domains,  even 
if  only  one  domain  d  in  a  superdomain  U  is  actually  going  to  be  crossed,  all  descendants  of  U 
containing  d  may  need  to  be  queried  to  obtain  a  valid  path  (e.g.  to  cross  backbone  A  in  Figure  14, 
it  may  be  necessary  to  query  for  superdomain  A-4,  then  B-Z,  then  C-2). 


Initial 

view  size 

Merged 

view  size 

Scheme 

in  sd«gat.ewavs 

in  superdomains 

in  sd-gateways 

in  superdomains 

child-domains 

964/1006 

42/60 

1089/1282 

100/298 

sibling-domains 

1167/1269 

70/99 

1470/2190 

148/337 

leaf-domains 

963/1006 

40/60 

1108/1322 

130/411 

regions 

492/715 

85/163 

1042/2687 

158/369 

Table  3:  View  sizes  for  Internetwork  1. 

Table  3  lists  for  each  superdomain  scheme  the  average  and  maximum  of  the  initial  view  size 
and  of  the  merged  view  size.  The  initial  view  size  indicates  the  memory  requirement  at  a  router 
without  using  the  query  protocol  (i.e.  assuming  the  initial  view  has  a  valid  path).  The  merged  view 
size  indicates  the  memory  requirement  at  a  router  during  the  query  protocol  (after  finding  a  valid 


path).  The  memory  requirement  at  a  router  is  0(view  size  in  number  of  sd-gateways  x  Eq)  where 
Eg  is  the  average  number  of  edges  of  an  sd-gateway.  Note  that  the  source  does  not  need  to  store 
information  about  red  and  non-transit  domains  in  the  merged  views  (other  than  the  ones  already 
in  the  initial  view).  The  numbers  for  the  merged  view  sizes  in  Table  3  take  advantage  of  this. 

As  apparent  from  the  table,  leaf-domains,  child-domains  and  regions  scale  better  than  sibling- 
domains.  There  are  two  reasons  for  this.  First,  placing  a  backbone  (regional  or  MAN)  domain  d  as  a 
sibling  to  d-4  {d-Z  or  d-2)  doubles  the  number  of  level  4  (3  or  2)  superdomains  in  the  views  of  routers. 
Second,  since  these  domains  have  many  edges  to  the  domains  in  their  associated  superdomains,  the 
end  points  of  each  of  these  edges  become  sd-gateways  of  the  associated  superdomains.  Note  that 
regions  scales  much  superior  to  the  other  schemes  in  the  initial  view  size.  This  is  because  most 
edges  are  local  (i.e.  contained  within  regions),  thus  contained  completely  in  superdomains.  Hence, 
their  end  points  are  not  sd-gateways. 

Overall,  the  child-domains  and  regions  schemes  scale  best  in  space,  time  and  communication 
requirements.  We  have  repeated  the  above  evaluations  for  two  other  internetworks  and  obtained 
similar  conclusions.  The  results  are  in  Appendix  A. 

8  Related  Work 

III  this  section,  we  survey  recently  proposed  inter-domain  routing  protocols  that  support  ToS  and 
policy  routing  for  large  internetworks. 

Nimrod  [6]  and  IDPR  [16]  use  the  link-state  approach  with  domain-level  source  routing  to 
enforce  policy  and  ToS  constraints  and  superdomains  to  solve  scaling  problem.  Nimrod  is  still  in 
a  design  stage.  Both  protocols  sulfer  from  loss  of  policy  and  ToS  information  as  mentioned  in  the 
introduction.  A  query  protocol  for  Nimrod  is  being  developed  to  obtain  more  detailed  policy,  ToS 
and  topology  information. 

BGP  [12]  and  ID  IIP  [14]  are  based  on  a  path-vector  approach  [15].  Here,  for  each  destination 
domain  a  router  maintains  a  set  of  paths,  one  through  each  of  its  neighbor  routers.  ToS  and  policy 
information  is  attached  to  these  paths.  Each  router  requires  x  Nn  x  Er)  space,  where  Nd 

is  the  average  number  of  neighbor  domains  for  a  domain  and  Nr  is  the  number  of  routers  in  the 
internetwork.  For  each  destination,  a  router  exchanges  its  best  valid  path  with  its  neighbor  routers. 
However,  a  path-vector  algorithm  may  not  find  a  valid  path  from  a  source  to  the  destination  even 


if  such  a  route  exists  [16]^^  (i.e.  detailed  ToS  and  policy  information  may  be  lost).  By  exchanging  k 
paths  to  each  destination,  the  probability  of  detecting  a  valid  path  for  each  source  can  be  increased. 
But  to  guarantee  detection,  either  aU  possible  paths  should  be  exchanged  (exponential  number  of 
paths  in  the  worst  case)  or  source  policies  should  be  made  public  and  routers  should  take  this  into 
account  when  exchanging  routes.  However,  this  fix  increases  space  and  communication  requirements 
drastically. 

IDRP  [14]  uses  superdomains  to  solve  the  scaling  problem.  It  exchanges  all  paths  between 
neighbor  routers  subject  to  the  following  constraint:  a  router  does  not  inform  a  neighbor  router 
of  a  route  if  usage  of  the  route  by  the  neighbor  would  violate  some  superdomain's  constraint  on 
the  route.  IDRP  also  suffers  from  loss  of  ToS  and  policy  information.  To  overcome  this  problem, 
it  uses  overlapping  superdomains:  that  is,  a  domain  and  superdomain  can  be  in  more  than  one 
parent  superdomain.  If  a  valid  path  over  a  domain  can  not  be  discovered  because  the  constraints 
of  a  parent  superdomain  are  violated,  the  same  path  may  be  discovered  through  another  parent 
superdomain  whose  constraints  are  not  violated.  However,  handling  ToS  and  policy  constraints 
in  general  requires  more  and.  more  combinations  of  overlapping  super  domains,  resulting  in  more 
storage  requirement. 

Reference  [9]  combines  the  benefits  of  path-vector  approach  and  link-state  approach  by  having 
two  modes:  An  NR  mode,  which  is  an  extension  of  IDRP  and  is  used  for  the  most  common  ToS 
and  policy  constraints;  and  a  SDR  mode,  which  is  like  ID  PR  and  is  used  for  less  frequent  ToS  and 
policy  requests.  This  study  does  not  address  the  scalability  of  the  SDR  mode.  Ongoing  work  by 
this  group  considers  a  new  SDR  mode  which  is  not  based' on  IDPR. 

Reference  [19]  suggests  the  use  of  multiple  addresses  for  each  node,  one  for  each  ToS  and  Policy. 
This  scheme  does  not  scale  up.  In  fact,  it  increases  the  storage  requirement,  since  a  router  maintains 
a  route  for  each  destination  address,  and  there  are  more  addresses  with  this  scheme. 

The  landmark  hierarchy  [18,  17]  is  another  approach  for  solving  scaling  problem.  Here,  each 
router  is  a  landmark  with  a  radius,  and  routers  which  are  at  most  radius  away -from  the  landmark 
maintain  a  route  for  it.  Landmarks  are  organized  hierarchically,  such  that  radius  of  a  landmark 
increases  with  its  level,  and  the  radii  of  top  level  landmarks  include  all  routers.  Addressing  and 

For  example,  suppose  a  router  u  has  two  paths  Pi  and  P2  to  the  destination.  Let  u  have  a  router  neighbor  t?, 
which  ^  in  another  domain,  u  chooses  and  informs  r  of  one  of  the  paths,  say  Pi.  But  Pi  may  violate  source  policies 
of  V  s  domain,  and  P2  may  be  a  valid  path  for  v. 


packet  forwarding  schemes  are  introduced.  Link-state  algorithms  can  not  be  used  with  the  landmark 
hierarchy,  and  a  thorough  study  of  enforcing  ToS  and  policy  constraints  with  this  hierarchy  has 
not  been  done. 

In  [1],  we  provided  an  alternative  solution  to  loss  of  policy  and  ToS  information  that  is  perhaps 
more  faithful  to  the  original  superdomain  hierarchy.  To  handle  superdomain-level  source  routing 
and  topology  changes,  we  augmented  each  superdomain-level  edge  (U,V)  with  the  address  of  an 
exit  domain  u  in  U  and  an  “entry’’  domain  u  in  V.  To  obtain  internal  views,  we  added  for 
each  visible  superdomain  U  the  edges  from  U  to  domains  outside  the  parent  of  U,  Surprisingly, 
this  approach  and  the  gateway-level  view  approach  have  the  same  memory  and  communication 
requirements.  However,  the  first  approach  results  in  much  more  complicated  protocols. 

Reference  [2]  presents  interdomain  routing  protocols  based  on  a  new  kind  of  hierarchy,  referred 
to  as  the  viewserver  hierarchy.  This  approach  also  scales  well  to  large  internetworks  and  does 
not  lose  detail  ToS  and  policy  information.  Here,  special  routers  called  viewservers  maintain  the 
view  of  domains  in  a  surrounding  precinct.  Viewservers  are  organized  hierarchically  such  that 
for  each  viewserver,  there  is  a  domain  of  a  lower  level  viewserver  in  its  view,  and  views  of  top 
level  viewservers  include  domains  of  other  top  level  viewservers.  Appropriate  addressing  and  route 
discovery  schemes  are  introduced. 

9  Conclusion 

We  presented  a  hierarchical  inter-domain  routing  protocol  which  satisfies  policy  and  ToS  con¬ 
straints,  adapts  to  dynamic  topolog}^  changes  including  failures  that  partition  domains,  and  scales 
well  to  large  number  of  domains. 

Our  protocol  achieves  scaling  in  space  requirement  by  using  superdomains.  Our  protocol  main¬ 
tains  superdomain-level  views  with  sd-gateways  and  handles  topology  changes  by  using  a  link-state 
view  update  protocol.  It  achieves  scaling  in  communication  requirement  by  flooding  topology 
changes  affecting  a  superdomain  U  over  U's  parent  superdomain. 

Our  protocol  does  not  lose  detail  in  ToS,  policy  and  topology  information.  It  stores  both  a 
strong  set  of  constraints  and  a  weak  set  of  constraints  for  each  visible  superdomain.  If  the  weak 
constraints  hut  not  the  strong  constraints  of  a  superdomain  U  are  satisfied  (i.e.  the  aggregation  ha.s 
resulted  in  loss  of  detail  in  ToS  and  policy  information),  then  some  paths  through  U  may  be  valid. 


Our  protocol  uses  a  query  protocol  to  obtain  a  more  detailed  “internal”  view  of  such  superdomains, 
and  searches  again  for  a  valid  path.  Our  evaluation  results  indicate  that  the  query  protocol  can  be 
performed  using  15%  extra  space. 

One  drawback  of  our  protocols  is  that  to  obtain  a  source  route,  views  are  merged  at  or  prior 
to  the  connection  setup,  thereby  increasing  the  setup  time.  This  drawback  is  not  unique  to  our 
scheme  [7,  16,  6,  9].  There  are  several  ways  to  reduce  this  setup  overhead.  First,  source  routes 
to  frequently  used  destinations  can  be  cached.  Second,  the  internal  views  of  frequently  queried 
superdomains  can  be  cached  at  routers  close  to  the  source  domain.  Third,  better  heuristics  to 
choose  candidate  paths  and  candidate  superdomains  to  query  can  be  developed. 

We  also  described  an  evaluation  model  for  inter-domain  routing  protocols.  This  model  can  be 
applied  to  other  inter-domain  routing  protocols.  We  have  not  done  so  because  precise  definitions  of 
the  hierarchies  in  these  protocols  are  not  available.  For  example,  to  do  a  fair  evaluation  of  IDPR[16], 
we  need  precise  guidelines  for  how  to  group  domains  into  superdomains,  and  how  to  choose  between 
the  strong  and  weak  methods  when  defining  poUcy/ToS  constraints  of  superdomains.  In  fact,  these 
protocols  have  not  been  evaluated  in  a  way  that  we  can  compare  them  to  the  superdomain  hierarchy. 
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A  Results  for  Other  Internetworks 


Results  for  Internetwork  2 


The  parameters  of  the  second  internetwork  topology,  referred  to  as  Internetwork  2,  axe  the  same  as 
the  parameters  of  Internetwork  1  but  a  different  seed  is  used  for  the  random  number  generation. 

Our  evaluation  measures  were  computed  for  a  set  of  100,000  source-destination  pairs.  The 
minimum  spl  of  these  pairs  was  1,  the  maximum  spl  was  14,  and  the  average  spl  was  7.13. 

Table  5  and  Table  4  shows  the  results.  Similar  conclusions  as  in  the  case  of  Internetwork  1  hold. 


Results  for  Internetwork  3 


The  parameters  of  the  third  internetwork  topology,  referred  to  as  Internetwork  3,  are  shown  in 
Table  6.  Internetwork  3  is  more  connected,  more  class  0,  1  and  2  domains  are  green,  and  more 
class  3  domains  are  red.  Hence,  we  expect  bigger  view  sizes  in  number  of  sd-gateways. 


Scheme 

No  query  needed 

Candidate  Paths 

Candidate  Superdomains 

child-domains 

205 

4.52/20 

10.22/47 

sibling- domains 

205 

3.01/8 

6.50/21 

leaf-domains 

205 

8.80/32 

21.34/82 

regions 

640 

3.52/10 

.  7.85/28 

Table  4:  Queries  for  Internetwork  2. 


Scheme 

Initial 

in  sd-gateways 

view  size 

in  superdomains 

Merged 

in  sd-gateways 

view  size 

in  superdomains 

child-domains 

958/1012 

43/60 

1079/1269 

118/306 

sibling-domains 

1153/1283 

72/101 

1480/2169 

160/324 

leaf-domains 

956/1009 

41/58 

1095/1281 

156/387 

regions 

624/1024 

110/231 

1356/3578 

206/435 

Table  5:  View  sizes  for  Internetwork  2. 

Our  evaluation  measures  were  computed  for  a  set  of  100,000  source-destination  pairs.  The 
minimum  spl  of  these  pairs  was  1,  the  maximum  spl  was  11,  and  the  average  spl  wsls  5.95. 

Table  8  and  Table  7  shows  the  results.  Similar  conclusions  as  in  the  cases  of  Internetwork  1 
and  2  hold. 


Brandling  factor  is  4  for  aJl  domain  dasses. 


Class  i 

No.  of  Domains 

No.  ofRegions^^ 

%  of  Green  Domains 

Edges  b 

Class  j 

etween  ( 

Local 

Classes  i  a 

Remote 

ad  j 

Far 

0 

10 

4 

0.85 

0 

8 

7 

0 

1 

100 

16 

0.80 

0 

190 

20 

0 

1 

50 

20 

0 

2 

64 

0.75 

0 

500 

50 

0 

1 

1200 

D 

2 

200 

40 

0 

3 

10000 

256  . 

0.10 

0 

300 

50 

0 

1 

250 

100 

0 

2 

10250 

150 

50 

3 

200 

150 

Table  6:  Parameters  of  Internetwork  3. 


Scheme 

No  query  needed 

Candidate  Paths 

Candidate  Supexdomains 

child- domains 

142 

3.99/29 

7.70/43 

sibling-domains 

142 

2.95/10 

5.39/22 

leaf-domains 

142 

9.65/70 

18.99/103 

regions 

676 

3.47/17 

6.25/21 

Table  7:  Queries  for  Internetwork  3. 


Scheme 

Initial 

in  sd-gateways 

view  size 

in  snperdomains 

Merged 

in  sd-gateways 

view  size 

in  superdomains 

child- domains 

2160/2239 

4Z/60 

2354/2647 

107/348 

sibling-domains 

2365/2504 

72/m 

2606/3314 

148/356 

leaf-domains 

2159/2236 

41/58 

2386/2645 

160/648 

regions 

1107/1644 

110/231 

1850/3559 

194/436 

Table  8:  View  sizes  for  Internetwork  3. 


Variables: 

VitWx.  Dynamic  view  of  z. 

WVitWx[djaiddTess).  Temporary  view  of  x.  dMddrcss  is  the  destination  address. 

Used  for  merging  internal  views  of  super  domains  to  the  view  of  x, 

P ending Rtqx{dMddTtss),  Integer,  djaddress  is  the  destination  address. 

Number  of  outstanding  request  messages. 

Events: 

Requesix{d-address)  {Executed  when  x  wants  a  valid  domain-level  source  route} 
allocate  WViews{d-address)  :=  VieWxl  allocate  P ending Reqx{djjddr ess)  :=  0; 
searc/ix(d-address); 

where 

searchx  {djaddress) 

if  there  is  a  valid  path  to  djaddress  in  WViewxid^address)  then 
result  :=  shortest  valid  path; 

deallocate  WViewx{djiddress)^  P ending Reqx{djaddres$)\ 
return  result] 

else  if  there  is  a  candidate  path  to  djaddress  in  WVieWxid^ddress)  then 

Let  Cpaih  —  (I/o-pOoi  *  *  •  J  J  -Plo,  .  .  . ,  l/j  ?  ’  *  *  »  Um  ‘pmoj  •  •  •  J  Vm  } 

be  the  shortest  candidate  path; 
for  Ui  in  cpaih  such  that  Ui  is  candidate  do 

Kt]iBh\eSend(KeqaestlYiei:,Ui^  addres${x),  djaddress)  to  gi^ 

P  ending  Reqxid^ddress)  :=  P  ending  Reg  x{d^addr  ess)  -f  1; 
else 

deaJlocate  WViewx{d^address)^  P ending Reqx{djaddr ess)] 
return  failure; 
endif 
endif 

TimeOuix{d^address)  {Executed  after  a  time-out  period  2ind  P ending Reqx{djiddr ess)  ^  0.) 

deallocate  WViewx{d^address)^  P ending Reqx{djiddr ess)] 
return  failure; 


Figure  15:  view-query  protocol:  State  and  events  of  a  router  x.  (Figure  continued  on  next  page.) 


Receives:  (RequestIViey,  sdid,  x,  sjaddress,  d^address) 

ReliableSenci(ReplyIViey,  sdid,  x,  IVievJs:{U),djiddTess)  to  s^address; 

Receive^  (ReplylViey,  sdid,  gid,  iview,  djiddress) 

if  P ending Reqrid^addr ess)  yr  0  then  -[No  time-out  happened} 

P ending Reqx{dMddress)  :=  P ending Reqx{d^addr ess)  —  1; 

{merge  internal  view] 
delete  (sdid, sk)  from  WViewx] 
for  {ckild^  500715,  lycoTis,  gaieway-sei)  in  iview  do 
if  -^3(child,  €  WViewx  then 

insert  {child,  scons,  wcons,  gcieway-sei)  in  WVieWx] 
else 

for  {gid,  is,  edge-sei)  in  gaieway-sei  do 

if  Sigid,  iimesiamp,  *)  e  Gaieways&Edgcs^{ckild)  A  ts  >  iimesiamp  then 
delete  {gid,  *)  from  Gaieways&Edge$^{child); 
endif; 

if-’3{pzd,  *)  £  Gaieway$&Edges^{child)  then 

insert  {gid,  is,  edge-sei)  to  Gaieways&Edges^{child)’, 
endif 

endif 

if  Pe7id27ipiJe^-(d-addre55)  =  0  then  {All  pending  replies  are  received} 

search^  {d^address) ; 
endif 
endif 


Figure  15:  view-query  protocol:  State  and  events  of  a  router  x.  (cont.) 


Constants: 

AdjLocalRouters^.  (C  Kodelds).  Set  of  neighbor  routers  in  r’s  domain. 

AdjForeignGatewaySy.  (C  Kodelds).  Set  of  neighbor  routers  in  other  domains. 

Aiicest;or,-(i).  (C  SuperDomainlds).  fth  ancestor  of  x. 

Variables; 

Viewx.  Dynamic  view  of  i. 

IntraDomainRTx*  Intra-domain  routing  table  of  x.  Initially  contains  no  entries. 

Clocks  :  Integer.  Clock  of  x. 

Events: 

iieceiver (Update,  sdid,  gid,  is,  edge-sei)  horn  sender 

if  3(p2d,  iimesiampj  £  Gaieways&Edges^(sdid)  A  is  >  timestamp  then 
delete  {gidy  *}  from  Gaieway$&Edges^(sdid)\ 
endif; 

if  *)  £  Gateways&Edges^(sdid)  then 

//oodr  ((Update,  sdid,  gidy  ts,  edge-set)); 
insert  (pid,  Is,  edge-set)  to  Gaieways& Edges j^{sdid)\ 
updaiejpaTent^domainSx{^e^^'l{sdid)  -h  1); 
endif 

where 

update  jpartnt^domainSx{$taTiinglevel) 

for  level  :=  startinglevel  to  number  of  levels  in  the  hierarchy  do 
sdid  :=  Ancestor/cvci(^); 
if  X  €  0^xe'a^Js{sdid)  then 

edge-set  aggregate  edges  of  sdidix  using  V iewx ,  IniraDomainRTx  and  links  of  x; 
timestamp  =  Clockx; 

//oodx((Update,  sdid.  x,  timestampy  edge-set)); 
delete  (x,  *,  *)  from  Gateways&Edges^{sdid); 
insert  (x,  timestamp y  edge-set)  to  Gateways&Edges^{sdid); 
endif 

DoJJpdattx  {Executed  periodically  and  upon  a  change  in  IntraDomainRTx  or  links  of  x} 
update^arcnt^omainsxil) 

LinkJiecoveryx{y)  {(^jV)  is  a  link.  Executed  when  (x,y)  recovers.} 
for  all  {sdidy  *)  in  Viewx  do 

if  3i  :  Aiicestor,-(y)  =  kncestoTy{sdid)  then 

for  all  (gidy  timestamp y  edge-set)  in  Gateways&Edges^{$did)  do 
Send((Update,  sdid,  gidy  timestampy  edge-set))  to  y; 

endif 

f  I  oodx  {packet) 

for  all  y  £  AdjLocalRouters^  do 
Send(paciet)  to  y; 

for  all  y  £  Adj  Foreign  Gateways.  A  3i  :  Ancestor,- (y)  =  kncesxoTi{packei,sdid)  do 
Send  (packet)  to  y; 

Figure  16:  view-update  protocol:  State  and  events  of  a  router  x. 
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Abstract 

Real-time  computer  systems  have  become  more  and  more  important  in  many  ap¬ 
plications,  such  as  robot  control,  flight  control,  and  other  mission-critical  jobs.  The 
correctness  of  the  system  depends  on  the  temporal  correctness  as  well  as  the  functional 
correctness  of  the  tasks.  We  propose  a  scheduling  algorithm  based  on  an  analytic 
model.  Our  goal  is  to  derive  the  optimal  schedule  for  a  given  set  of  aperiodic  tasks 
such  that  the  number  of  rejected  tasks  is  minimized,  and  then  the  finish  time  of  the 
schedule  is  also  minimized.  The  scheduling  problem  with  a  nonpreemptive  discipline 
in  a  uniprocessor  system  is  considered.  We  first  show  that  if  a  total  ordering  is  given, 
this  can  be  done  in  0{n^)  time  by  dynamic  programming  technique,  where  n  is  the 
size  of  the  task  set.  When  the  restriction  of  the  total  ordering  is  released,  it  is  known 

'This  work  is  supported  in  part  by  Honeywell  under  N00014-91-C^0195  and  Army /Phillips  under  DASG- 
60-92-C-0055.  The  views,  opinions,  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and 
should  not  be  interpreted  as  representing  the  official  policies,  either  expressed  or  implied,  of  Honeywell  or 
Army/Phillips. 


to  be  NP-complete  [3].  We  discuss  the  super  sequence  [18]  which  has  been  shown  to 
be  useful  in  reducing  the  search  space  for  testing  the  feasibility  of  a  task  set.  By  ex¬ 
tending  the  idea  and  introducing  the  concept  of  conformation,  the  scheduling  process 
can  be  divided  into  two  phases:  computing  the  pruned  search  space,  and  computing 
the  optimal  schedule  for  each  sequence  in  the  search  space.  While  the  complexity  of 
the  algorithm  in  the  worst  case  remains  exponential,  our  simulation  results  show  that 
the  cost  is  reasonable  for  the  average  case. 


1  Introduction 


In  a  iiaxd.  real-time  system,  the  computer  is  required  to  support  the  execution  of  applications 
in  which  the  timing  constraints  of  the  tasks  axe  specified.  The  correctness  of  the  system 
depends  on  the  temporal  correctness  as  well  as  the  functional  correctness  of  the  tasks.  Failure 
to  satisfy  the  timing  constraints  can  incur  fatal  errors.  Once  a  task  is  accepted  by  the  system, 
the  system  should  be  able  to  finish  it  under  the  timing  constraint  of  the  task.  A  task  T{  can 
be  characterized  as  a  triple  of  {r{,Ci,  d,),  representing  the  ready  time,  the  computation  time, 
and  the  deadhne  of  the  task,  respectively.  A  task  can  not  be  started  before  its  ready  time. 
Once  started,  the  task  must  use  the  processor  for  a  consecutive  period  of  c,-,  and  be  finished 
by  its  deadline.  The  task  set  is  represented  as  F  =  {Ti,T2,...,Tn}.  A  task  set  is  feasible  if 
there  exists  a  schedule  in  which  all  the  tasks  in  the  task  set  can  meet  their  timing  constraints. 
Scheduling  is  a  process  of  binding  starting  times  to  the  tasks  such  that  each  task  executes 
according  to  the  schedule.  A  sequence  S  =  .,r/),  where  k  <  n.  Tf  represents 

the  ith  task  of  the  sequence  S  for  any  1  <  i  <  A:.  A  sequence  specifies  the  order  in  which  the 
tasks  are  executed.  Without  confusion,  a  schedvde  can  be  represented  as  a  sequence.  How 
to  schedule  the  tasks  so  that  the  timing  constraints  are  met  is  nontrivial.  Many  scheduling 
problems  are  known  to  be  intractable  [3]  in  that  finding  the  optimal  schedule  requires  large 
amounts  of  computations  to  be  carried  out. 

The  approaches  adopted  to  date  for  scheduling  algorithms  can  be  generally  classified 
into  two  categories.  One  approach  is  to  assign  priorities  to  tasks  so  that  the  tasks  can  be 
scheduled  according  to  their  priorities  [1,  7,  8,  10, 12, 15, 14].  This  approach  is  called  priority 
based  scheduling.  The  priority  can  be  determined  by  deadline,  execution  time,  resource 
requirement,  laxity,  period,  or  can  be  programmer-defined  [4].  The  other  is  time  based 
scheduling  approach  [9, 13].  A  time  based  scheduler  generates  as  an  output  a  calendar  'which. 
specifies  the  time  instants  at  which  the  tasks  start  and  finish. 

Generally  speaking,  scheduling  for  aperiodic  task  sets  without  preemption  is  NP-complete 
[3].  Due  to  the  intractabihty,  several  search  algorithms  [11, 17, 19,  20]  are  proposed  for  com¬ 
puting  optimal  or  suboptimal  schedules.  Anal3dic  techniques  may  also  be  used  for  optimal 
scheduling.  A  dominance  concept  by  Erscbler  et  al  [2]  was  proposed  to  reduce  the  search 
space  for  checking  the  feasibility  of  task  sets.  They  explored  the  relations  among  the  tasks 


and  detennined  the  partial  orderings  of  feasible  schedules.  "Vuan  and  Agrawala  [18]  proposed 
decomposition  methods  to  substantially  reduce  the  search  space  based  on  the  dominance 
concept.  A  task  set  is  decomposed  into  subsets  so  that  each  subset  can  be  scheduled  in¬ 
dependently.  A  super  sequence  is  constructed  to  reduce  search  space  further.  Saksena  and 
Agrawala  [13]  investigated  the  technique  of  temporal  analysis  serving  as  a  pre-processing 
stage  for  scheduling.  The  idea  is  to  modify  the  windows  of  two  partially  ordered  tasks 
which  are  generated  by  the  temporal  relations  so  that  more  partial  orderings  of  tasks  may 
be  generated  recursively. 

The  time  based  model  is  employed  by  several  real-time  operating  systems  currently  being 
developed,  including  MARUTI  [5],  MARS  [6],  and  Spring  [16].  In  this  paper,  we  study  an 
analytic  approach  to  optimal  scheduling  under  the  time  based  model.  When  comphcated 
timing  constraints  and  task  interdependency  are  taken  into  consideration,  the  schedulability 
analysis  of  priority  based  scheduhng  algorithms  becomes  much  more  difficult.  By  analytic 
approach,  we  believe  that  the  time  based  scheduling  algorithm  and  analysis  require  reason¬ 
able  amounts  of  computations  to  produce  a  feasible  schedule. 

The  rest  of  this  paper  is  organized  as  follows.  In  section  2,  we  describe  how  to  compute 
the  optimal  schedule  for  a  sequence.  In  section  3,  releasing  the  restriction  of  total  ordering 
on  a  sequence,  we  present  the  approach  to  computing  the  optimal  schedule  for  a  task  set. 
Related  theorems  are  also  presented.  In  section  4,  a  simulation  experiment  is  conducted  to 
compare  the  performance  of  different  algorithms.  The  last  section  is  our  conclusions. 

2  Scheduling  a  Sequence 

The  size  of  a  sequence  (task  set)  is  the  number  of  tasks  in  the  sequence(task  set),  and  is 
denoted  by  \S\  (|r|).  A  sequence  S  is  feasible  if  all  tasks  in  5  are  executed  in  the  order  of 
the  sequence  and  the  timing  constraints  are  satisfied.  For  convenience,  we  further  define  an 
instance^  I,  to  be  a  sequence  such  that  j Jj  =  [Tj.  We  denote  the  instance  I  by 

I  =  (T’,Tl,...X)- 

Notice  that  {}  is  used  to  represent  a  task  set,  and  ()  a  sequence.  Let  Ti  and  Tj  be  two 
tasks  belonging  to  sequence  S.  If  T,-  is  located  before  Tj  in  the  sequence  S,  we  say  that 


Figure  1:  An  instance  1  =  (T/,  T/ >  •  •  •  > 

{Ti,Tj)  conforms  to  S.  A  sequence  Si  conforms  to  a  sequence  S2,  if,  for  any  T,-  and  Tj, 
{Ti,Tj)  conforming  to  51  implies  {Ti,Tj)  conforming  to  52.  We  use  a{k)  to  represent  the 
optimal  schedule  of  {Tf,Tf, . . . ,  T/)  in  the  sense  that  for  any  feasible  sequence  5  conforming 
to  either 

|5|  <  W{k)l 

or 

|51  =  |o-(i)|  and  fs  >  f^^k),  (1) 

where  fs  and  f„^k)  is  the  finish  time  of  5  and  ij{k)  respectively.  a{k)  is  thus  the  optimal 
schedule  for  the  first  k  tasks  of  /.  The  optimal  schedule  for  an  instance  /  rari  thus  be 
represented  by  a{n).  For  simplicity,  let  u*  =  )c7(^:)l.  In  this  section,  we  will  discuss  the 
scheduling  for  an  instance.  However,  the  approach  is  generally  applicable  to  any  sequence. 

2.1  Preliminary 

We  assume  that  r,-  +  c,-  <  d,-  holds  for  each  task  J-  in  the  task  set  F.  At  the  first  glance,  one 
may  attempt  to  compute  a-{k)  based  on  cr{k  —  1).  However,  with  careful  examination,  we 
can  find  that  merely  computing  a{k  - 1)  does  not  suffice  to  compute  This  is  illustrated 
by  the  example  in  Figure  1.  From  this  example,  we  can  obtain 

a(l)  =  (T/) 


<r(2)  =  {r/,r'). 


At  the  next  step,  cr(2)  0  (T/ )  is  not  feasible,  where  the  operator  0  means  concatenation 
of  two  sequences.  One  task  must  be  rejected,  which  is  T/  in  this  case.  Hence,  we  got 

<7(3)  =  <7(2)  =  (r/.r,'). 

A  problem  arises  at  the  next  step.  ct(3)  0  (T/)  is  not  feasible  either.  If  we  tr)'  to  fix  it 
by  taking  a  task  off  this  sequence,  the  result  is 

a'(4)=cr(3)  =  (T’/,r/). 

However,  the  correct  result  should  be 

<7(4)  =  (r/.r/). 

Although  both  o''(4)  and  cr(4)  are  of  the  same  size,  the  latter  comes  with  a  shorter  finish 
time,  which  becomes  significant  at  next  step.  We  get 

<7(77)  =  ,7(5)  =  <7(4)  ©  (Tl)  =  {Ti,T‘,Ti). 

However,  with  o’'(4),  we  would  have 

<7'(5)  =  <7'(4)  =  {r',r/). 

This  example  shows  that  merely  computing  cr{k  —  1)  does  not  suffice  to  compute  cr{k). 
When  a{k  —  1)  is  obtained,  it  can  not  be  predicted  as  to  which  tasks  would  be  included  in 
<7(A:).  The  approach  has  to  be  modified  as  follows. 


2.2  Sequence-Scheduler  Algorithm 

We  denote  by  ^(/Cjy)  the  sequence  snch  that  S{kJ)  conforms  to  .  ,T/)  and  |S'(^:,i)l  = 

j,  where  j  <  |<r(^:)|.  S{kJ)  represents  any  sequence  of  j  tasks  picked  up  from  the  first  k 
tasks  of  S.  We  further  define  a  sequence,  denoted  by  a{kj),  to  be  the  optimal  schedule  with 
degree  j  for  ,  •  • . ,  Tl )  in  the  sense  that  for  any  feasible  sequence  S{k,pj,  we  have 


/<r(fc,i)  <  fs{k,j)- 


Notice  that  o'(A:)  is  an  abbreviation  of  cr(A:,  Uk)^  If  a  sequence  S{k^j')  is  not  feasible, 
fs{k,j)  =  oo. 

We  would  like  to  compute  cr{k,j)  based  on  cr{k  -  where  j'  <j<  |cr(;b)|.  The  basic 
idea  is  as  foUows.  We  know  (T{k,j)  either  contains  T/  or  not.  If  so,  then  the  other  j  -  I 
tasks  are  pidced  up  from  the  first  k-1  tasks,  and  a{k  -l,j  -1)  is  one  of  the  best  choices. 
In  this  case,  (T{k,j)  =  o[k  —  1,  j  —  1)  0  T/.  K  a{k,j)  does  not  contain  ,  all  of  the  j  tasks 
should  be  picked  up  from  the  first  k  —  1  tasks,  and  cT{k  -  l,j)  is  one  of  the  best  choices.  In 
this  case,  cr{k,j)  =  cr{fc  —  1,  j).  Whether  taking  T/  or  not  is  determined  by  comparing  which 
one  of  the  sequences  comes  with  a  shorter  finish  time.  Therefore,  cr{k,j)  can  be  determined 
by  cr{k  —  l,j  —  1),  and  a{k  —  I,;).  The  computation  of  a{k  —  l,j)  is  in  turn  based  on 
cr(k  —  2,j  —  1),  and  a(k  —  2,j).  In  general,  at  each  step  k,  we  need  to  compute  cr{k^j)  for 
j  —  1?2, . . . ,  |cr(fc)|.  The  algorithm  Sequence-Scheduler  in  Figure  2  formalizes  this  idea.  It  is 
worth  mentioning  that  the  condition  of  the  ’’while"  statement  in  the  algorithm  is  designed 
to  let  j  increase  from  1  through  lo'(A:)].  The  correctness  is  verified  in  the  next  section. 


2-3  Verification  of  Sequence-Scheduler  Algorithm 

The  proof  of  the  correctness  of  the  algorithm  along  with  some  related  lemmas  are  given 
below. 

Lemma  1  Let  51  and  52  be  two  sequences  such  that  /si  <  /s2-  If  52  ©  {Tx)  is  feasible, 
then  fsie{Tx)  <  /s2e{r>)- 


Algorithm  Sequence-Scheduler: 


Input:  an  instance  I  =  (T/,  Tj, .  •  - ,  Tl) 

Output:  the  optimal  schedule  <7(n)  =  cr{n,Un)  for  I 

cr(0,0)  :=  ();  uo  =  0 
for  ^:  :=  1,2, .  ..,n 
j  ;=  1 

while  {j  <  Uk-i)  or  ((;  =  Uk-i  +  1)  and  {<7{k  —  l,Uk-i)  ©  (T/)  is  feasible)) 

cr{k,j)  :=  a{k  -  l,i  -  1)  ©  (T/) 
else 

a{kj)  :=  a{k-l,j) 
endif 

j  :=  i  +  1 

endwhile 

Ufc  -  1 
endfor 

Figure  2:  Sequence- Scheduler  Algorithm 


Proof:  This  is  straightforward  via  the  following  equations. 


/sie{X.)  =  ^o-x{fsi,rx^) ct. 

<  max{fs2,rT^)  +  CT, 

=  /52@(Xr> 


□ 

Corollary  1  Let  Si  and  S2  be  two  sequences  such  that  fsi  <  /s2-  If  52  ©  53  is  feasible, 
where  53  is  another  sequence,  then  fsi^sz  ^  /s2©53- 

Proof:  This  is  a  direct  result  of  applying  Lemma  1  repeatedly  through  the  tasks  in  53.  □ 
Lemma  2  =  ujt-i  or  u*  =  +  1. 

Proof:  It  is  obvious  that  Uk  >  where  both  Uk  and  Uk-i  a,re  integers.  Let  us  assume 
that  Uk  =  Uk-i  +a,  and  a  >2.  We  are  going  to  show  that  this  assumption  does  not  hold.  We 
know  a[k^Uk)  either  contains  P/  or  not.  K  <7{k,Uk)  contains  T/,  we  can  represent  a(k,Uk) 
as  S{k  —  l,Uk  —  1)  B  {Tjl),  by  picking  up  a  proper  sequence  S{k  —  l,Uk  —  1).  However, 
from  the  assumption  above,  we  have  Uk-i  =  Uk  —  Q  <  Uk  —  I  =  15(fc  —  l,Ui  —  1)|.  This 
contradicts  the  definition  of  Uk-i.  On  the  other  hand,  if  <7{k,Uk)  does  not  contain  Tk,  we 
can  represent  a{k,  Uk)  as  S{k  —  1,  ujt).  We  have  Uk-y  =  u*  —  or  <  u^,  which  is  a  contradiction. 
The  assumption  thus  does  not  hold.  Therefore,  we  have  a<l.  □ 

From  this  lemma,  cr^k^j)  does  exist  for  j  <  Uk-y.  Furthermore,  in  the  algorithm,  j  = 
Uk-y  +  1  is  tested  to  see  if  u*  =  Uk-y  +  1. 

Theorem  1  For  k  1,2,  ...,n,  andj  =  1,2,  ...,Ui,  if  ff,f^k—y,j—y)@{Tl)  then 

~  -  1)  ©  (T/);  otherwise,  <T{k,j)  =  a-{k  -  l,j). 

Proof:  The  proof  is  by  induction  on  k.  When  k  =  1,  ly  =  (T/).  Since  Ui  <  1  and  (T/)  is 
feasible,  a(l,  1)  =  (T/).  It  is  easy  to  come  up  with  the  same  result  through  this  theorem. 
Thus  holds  the  base  case.  We  assume  that  we  can  compute  cr{k—  l.j),  for  j  =  1,2,..., 


in  the  same  way,  and  consider  the  case  of  k.  Let  us  hrst  bring  forward  three  basic  equations. 
Since  <  fs{k-i,j-i)->  tiie  following  equation  holds  by  Lemma  1 

By  induction  hypothesis  on  a{k  -  IJ)  such  that  fc{k-\j)  <  /s(fc-i,i),  we  have 

/<r(i-l.j-l)e(T/)  < /cr(fc-lj),  f<r(k-l,j-l)Q{Ti)  <  fs{k-i,j)-  (3) 

From  Equation  2,  we  have 

*/  f<r{k-l,j)  ^  Ja{k-lJ-l)e{T^)^  then  fc(k-l,j)  £  fs(k-lJ-l)®(Tj^)-  (4) 

From  Lemma  2,  we  know  either  u;t=  or  «*=  Ujt-i-  The  two  cases  are  discussed 

below. 

Case  I:  Uk=  We  first  discuss  the  situation  when  j  =  1,2,. . . ,  ujt  —  1.  We  know 

that  a  feasible  sequence  S(k,j)  is  either  in  the  form  of  S{k  —  l,j)  or  S{k  —  l,y  —  1)  ©  (Tjl). 

fcik-ij-i)e{Ti)  <  then  <  fs{k-i,j)  by  Equation  3.  Also  we  have 

/<7(fc-ij-i)e{r/)  ^  fs(k-i,j-i)e{Tjl)  ^5'  Equation  2.  This  means  ^  /s(fe,i) 

feasible  sequence  S{k,j).  Consequently,  a{k,j)  =  a{k  —  l^j  —  1)  ©  (T/),  which  justifies  the 
theorem.  On  the  other  hand,  if  f„(^k-ij-y)B{Tj^)  —  fer(k-ij),  then  fc{k-ij)  <  fs{k-ij-y)e{Tjl) 
by  Equation  4.  In  addition,  fc{k-ij)  <  fs{k-ij)  by  induction  hypothesis  on  a{k  —  l,j).  So 
fa(,k-ij)  <  fs(k,i)  for  any  feasible  sequences  S{kJ).  In  this  case,  <r{k,j)  =  o-(A:-  1,;),  which 
justifies  the  theorem. 

Then  we  discuss  the  situation  when  j  =  uk-  Since  Uk=  ujfe-i+l,  it  is  clear  that  (T/) 
belongs  to  <j{k):  otherwise,  we  need  to  pick  up  ni_i+l  tasks  from  Ik-i  to  make  a  feasible 
sequence,  which  violates  the  definition  of  Ui_i .  Therefore,  cr{k,j)  can  be  expressed  as  S{k  —  1^ 
Uk-y)@  {Tk)  by  picking  up  a  proper  sequence  S{k-1,  Uk-i).  Note  that  Uk-y=  j-l  here.  By 
Equation  2,  we  have  f„[k-ij-i)e{Ti)  <  /s(jfe-i.i-a)©(T/)>  ^7  sequence  F(fc-l,j-l)©(r/>. 

Thus,  a{k,j)  =  a{k  -  l,j  -  1)  ©  (T/).  Now  let  us  check  the  theorem.  The  sequence 
a{k  —  l,j)  =  a{k  —  1,  Ufc_i+1)  is  not  feasible;  thus  its  finish  time  is  oo.  The  condition 
fc(^k-i,j-i)e{Tji)  <  fc{k-i,j)  is  satisfied.  So  a{k,j)  =  cr{k  -  1,;  -  1)  ©  (T/).  This  justifies  the 
theorem. 

Case  II:  Uk=  Uk-i-  The  reasoning  follows  the  discussion  of  the  first  part  in  Case  I.  □ 


3  Scheduling  a  Task  Set 


Ib  this  section  we  discuss  how  to  schedule  a  task  set  by  using  Sequence-Scheduler.  The 
optimal  schedule,  p,  of  a  task  set  is  defined  as  follows:  for  any  feasible  sequence  S  consisting 
of  tasks  in  r,  we  have  either 

1^1  <  IpI, 


or 


15|  =  \p\  and  fs  >  fp. 

For  simplicity,  we  use  optimal  schedule  to  represent  the  optimal  schedule  of  the  task 
set,  when  there  is  no  confusion.  Note  that  the  optimal  schedule  of  the  task  set  is  the  best 
one  of  the  optimal  schedules  of  all  instances  in  the  task  set.  Erschler  et  al  [2]  proposed  the 
dominance  concept  to  reduce  the  number  of  permutations  that  should  be  examined  for  the 
feasibility  test  of  a  task  set.  Yuan  and  Agrawala  [18]  proposed  the  super  sequence  to  further 
reduce  the  search  space  for  testing  the  feasibility  of  a  task  set.  In  this  section,  we  show  that 
for  our  optimization  problem,  the  super  sequence  provides  a  valid  and  pruned  search  space. 
In  other  words,  there  exists  one  optimal  schedule  which  conforms  to  an  instance  in  the  super 
sequence  of  the  task  set.  Thus  we  may  use  Sequence-Scheduler  to  schedule  for  the  instances 
extracted  from  the  super  sequence  to  derive  the  optimal  schedule.  There  may  exist  more 
than  one  optimal  schedule  for  a  task  set.  Our  interest  is  on  how  to  derive  one  of  them. 

3.1  Super  Sequence 

Temporal  relations  between  two  tasks  T{  and  Tj  are  summarized  in  the  following.  They  are 
illustrated  by  Figure  3. 

•  leading  :  Ti  -<  Tj,ii  <  Vj,  di  <  dj,  but  both  of  the  equalities  do  not  hold  at  the  same 
time. 

•  matching  :  Ti  ||  Tj,  if  r,'  =  rj,  di  =  dj. 

•  containing  :  T{  U  Tj,  if  r,-  <  rj,  di  >  dj. 


Figure  3:  (i)  Ti  -<  Tj\  (it)  Ti  |1  Tj-;  (i«)  T-  U  Tj 

A  task  h  is  called  a  top  task  if  there  is  no  task  contained  by  k.  A  task  is  called  a  nontop 
task  if  it  contains  at  least  one  task.  Assume  that  we  have  t  top  tasks  in  the  task  set,  denoted 
by  hi,  hi, ...  .ht  respectively.  Denote  by  Mk  the  set  of  tasks  that  contain  the  top  task  hk, 
including  hk,  and  by  Mk  the  set  of  tasks  in  the  task  set  F  that  do  not  belong  to  Mk.  We  say 
that  Ti  is  weakly  leading  to  Tj,  denoted  by  Ti  <  Tj,  if  Ti  ■<  Tj  or  r,-  ||  Tj.  If  T,-  <  Tj  for  all  Tj 
belonging  to  S,  then  Ti  <  S. 

The  dominance  concept  is  originally  developed  by  Erschler  et  al  [2]  to  reduce  the  search 
space  for  testing  the  feasibility  of  a  task  set.  The  idea  is  extended  with  the  super  sequence 
proposed  by  Yuan  and  Agrawala  [18].  An  instance  I  dominates  an  instance  I'  iff: 

'  1'  feasible  =»  I  feasible. 

It  can  be  considered  that  7  is  a  better  candidate  as  a  feasible  schedule  than  I'.  A 
dominant  instance  is  an  instance  such  that  for  each  possible  instance  I  of  the  task  set,  if 
1  dominates  the  dominant  instance,  then  the  dominant  instance  dominates  I.  Thus  the 
dominant  instance  can  be  considered  as  the  best  candidate  of  the  feasible  schedule.  A  set  of 
instances  is  said  to  be  a  dominant  set,  if  I  does  not  belong  to  the  dominant  set,  then  there 
exists  a  dominant  instance  in  the  dominant  set  such  that  the  doroinant  instance  dominates 
I. 


A  super  sequence  A  serves  simileirly  as  a  dominant  set  in  that  there  exists  a  dominant 
instance  in  the  super  sequence;  and  it  is  more  appropriate  for  solving  our  problem.  A  super 
sequence  is  a  sequence  of  tasks,  where  duplicates  of  tasks  are  allowed.  The  purpose  is  to 
extract  instances  from  the  super  sequence  for  scheduling.  The  super  sequence  is  constructed 
according  to  the  dominant  rules  (2, 18]  described  below.  Whenever  a  task  satisfies  one  of  the 
conditions  specified  by  the  rules,  a  duplicate  of  the  task  is  inserted  into  the  super  sequence. 
Note  that  duplicates  can  only  be  generated  for  nontop  tasks.  The  top  tasks  appear  once  and 
only  once  in  the  super  sequence. 

Rule  Rl:  Let  To,  and  be  any  two  top  tasks.  If  Ta  -<  Tp,  then  To,  is  positioned  before  Tp. 
K  Tq  II  Tp,  the  order  of  the  two  top  tasks  is  determined  arbitrarily. 

A  unique  order  of  the  top  tasks  can  be  thus  determined  for  the  super  sequence.  Let  us 
denote  the  sequence  composed  of  the  top  tasks  by  J?  =  The  rule  implies 

that  if  Ta  is  positioned  before  Tp  in  the  super  sequence,  then  Ta<Tp.  So  hi  <1  /12  <d  . . .  <]  hj. 

Rule  R2: 

(1)  A  nontop  task  can  be  positioned  before  the  first  top  task  hi  only  when  it  contains  hi- 

(2)  A  nontop  task  can  be  positioned  after  the  last  top  task  ht  only  when  it  contains  ht. 

(3)  A  nontop  task  can  be  positioned  between  ^d  hk+i  only  when  it  contains  hk  or  hk+i- 

The  t  top  tasks  delimit  the  super  sequence  into  t  +  1  regions  by  rule  Rl.  Now  we  have 
t-rl  subsets  of  nontop  tasks  separated  by  the  t  top  tasks  by  rule  R2.  Generally  speaking, 
a  nontop  task  has  more  than  one  possible  location.  Denote  the  kth.  subset  by  Ak,  which  is 
between  top  tasks  hk  and  hk+i-  From  rule  R2,  it  can  be  deduced  that 

U  BkMi  U  Rx,fc+i,  where 

^k,Wi  =  G  Mk+i,  Bk,k+i  =  Mk  n  Mk+i,  •S]t,fc+i  =  H  Mk+\. 

Next  rule  is  to  specify  the  order  of  the  tasks  within  each  subset. 


(5) 


JR-ule  R3;  In  each  subset  Ak,  for  i  =  0, 1, . . . ,  u, 

(1)  the  tasks  in  axe  ordered  according  to  their  deadlines,  and  tasks  with  the  same 

deadlines  axe  ordered  arbitrarily, 

(2)  the  tasks  in  Bk,k+j  axe  ordered  arbitrarily, 

(3)  the  tasks  in  are  ordered  according  to  their  ready  times,  and  tasks  with  the  same 

ready  times  are  ordered  arbitrarily, 

(4)  the  tasks  in  Bj^j^  are  positioned  before  those  in  which  in  turn  are  positioned 

before  those  in  . 

Now  we  are  ready  to  construct  the  super  sequence  with  these  three  rules.  Top  tasks  are 
first  picked  out  and  ordered,  forming  f  +  1  regions.  In  each  region,  there  is  a  subsequence 
of  nontop  tasks.  An  instance  extracted  out  of  the  super  sequence  is  one  that  conforms  to 
the  super  sequence  without  duplication  of  tasks.  Let  g  be  the  number  of  top  tasks  that  a 
nontop  task  contains.  The  number  of  possible  regions  the  nontop  task  can  fall  into  is  $  +  1. 
The  number  of  instances  in  the  super  sequence  thus  sums  up  to 

^  =  5(5+1)”'. 

9=1 

where  n,  is  the  number  of  nontop  tasks  which  contains  q  top  tasks.  Compared  with 
an  exhaustive  search  which  takes  up  to  n!  instances  (permutations)  into  account,  the  super 
sequence  generally  leads  to  a  smaller  set.  Notice  that  it  takes  0(h)  time  to  check  if  an 
instance  is  feasible.  Hence,  the  time  complexity  of  the  feasibility  test  for  the  task  set  is 
0(N*n). 

3.2  Leading  Theorem 

The  super  sequence  is  not  onl}’  useful  in  testing  the  feasibility  of  a  task  set;  we  will  show  that 
it  is  also  useful  in  reducing  the  number  of  instances  to  be  examined  in  order  to  obtain  the 
optimal  schedule  of  a  task  set.  We  will  show'  that  there  exists  at  least  one  optimal  schedule 
which  conforms  to  an  instance  in  the  super  sequence  A.  Hence,  it  suffices  to  check  through 
A  to  obtain  the  optimal  schedule  of  F. 


It  is  -worth  attention  that  the  top  tasks  in  p  may  not  be  the  same  top  tasks  of  F.  This 
arises  because  some  of  the  top  tasks  of  F  may  be  rejected,  introducing  ne-w  top  tasks  in 
p.  Before  proceeding  to  verify  the  rules  for  the  super  sequence,  -we  will  first  introduce  the 
Leading  Theorem.  It  serves  as  the  base  for  further  analysis  in  the  Dominance  Theorem  and 
Conformation  Theorem  to  be  described  later.  The  Leading  Theorem  tells  that  imder  certain 
condition  we  can  adjust  the  order  of  tasks  to  satisfy  the  Weakly  Leading  Condition  to  be 
defined  below  and  do  not  introduce  a  schedule  with  greater  finish  time. 

Assume  that  5  is  a  feasible  sequence,  with  Lprej  L,  and  Lpost  subsequences  of  S  such  that 

S  —  Lpre  ©  L  0  Lposi- 

Let  us  denote  L  by 

where  w  >  0.  A  frame  F  is  defined  to  be  a  time  interval  characterized  bj^  a  beginning 
time  bp,  and  an  ending  time  ep.  We  say  that  is  a  frame  corresponding  to  L,  if  bp  =  sp, 
and  ep  =  where  Sp  is  the  starting  time  of  Tp,  and  is  the  finish  time  of  Tq. 

Theorem  2  (Leading  Theorem)  Assume  that  S  =  LpreQL®  Lpost  is  a  feasible  sequence, 
where  L  =  {Tp,Txj , T^j, . . . ,  Tx^,  Ta).  Let  be  a  frame  corresponding  to  L.  If  -<  Tp,  and 
there  does  not  exist  a  task  Ts,.,  1  <  i  <  w,  such  that  F  U  T®;,  then  there  exists  a  sequence  L 
which  is  a  permutation  of  L  such  that 
(i)  {Tq,Tp)  conforms  to  L,  and 


Before  we  can  proceed  to  prove  the  theorem,  the  following  definition  is  useful. 

Weakly  Leading  Condition:  a  sequence  5  =  {T^ ,Ti , . . .  ,T^)  satisfies  Weakly  Leading 
Condition  if  T/  <  <  . . .  <  T^. 

Lemma  3  Let  5  be  a  sequence  satisfying  Weakly  Leading  Condition.  If  (Ti,Tj)  conforms 
to  S  and  Ti  ||  T),  then  all  tasks  located  between  T,-  and  Tj  in  S  must  match  Ti  and  Tj. 


Proof:  For  any  task  located  between  Ti  and  Tj,  according  to  the  definition  of  Wealciy 
Leading  Condition,  we  have  r,-  <  r*  <  Vj  and  d;  <  <  dj.  Since  T;  ||  Tj,  Ti  —  rj  and 

di  =  dj.  Therefore,  we  have  r,-  =  =  Tj  and  d,-  =  d^  =  dj.  So,  Tx  matches  T;  and  Tj.  □ 

To  obtain  X,  let  us  modify  the  tasks  in  L  in  the  following  way.  If  the  ready  time  of  a 
task  is  less  than  bp,  then  its  ready  time  is  set  to  bp.  If  the  deadline  of  a  task  is  greater  than 
ejr,  then  its  deadline  is  set  to  ep.  The  computation  times  remain  unchanged.  Let  L'  be  a 
sequence  consisting  of  the  modified  tasks  with  the  same  order  of  L,  i.e., 

T>  _  /’v'  T'  T'  T'  T'\ 

~  Xi» -‘iji  •  •  *  5 a/‘ 

Since  To  <  Tp,  dp  >  d^  >  fa  =  ep.  So  d'p  =  d'^  =  ep.  Also  Va  <  rp  <  sp  =  bp,  so 
r'^  =  r'p=  bp.  This  is  illustrated  in  Fig  4  (ii). 

Note  that  swapping  Tp  and  in  the  sequence  does  not  result  in  a  feasible  sequence  in 
this  example.  It  is  essential  that  we  adjust  the  order  of  the  tasks  located  between  them.  Let 
L'  be  a  sequence  which  is  a  permutation  of  L'  and  satisfies  the  Weakly  Leading  Condition, 
and  to  which  {T^,Tp)  conforms.  Furthermore,  L'  satisfies  an  even  stronger  condition.  If  T^'’ 
is  positioned  before  Xj''  in  X',  then  Xf'"  <3  Xj''\  if,  furthermore,  Xf'  1|  Xf'',  the  corresponding 
tasks  Xp'  and  X^'  satisfies  that  Xf^'  <3  X^' .  The  idea  of  such  arrangement  is  that  when 
interchanging  Xp  and  T',  we  do  not  produce  a  new  reversed  poirlike  them.  By  reversed  pair 
we  mean  for  example  Xo  -<  Xp  hut  Xp  is  positioned  before  Xo  in  the  sequence.  So,  if  {Xf,  Xj' ) 
conforms  to  X',  the  corresponding  tasks  satisfies  the  condition  that  either  X^'  -<  Tj:'  or 
X^'  U  Xy  or  Xji'  U  T^'.  One  possibility  of  X'  is  iUustrated  in  Fig  4  (iii),  or 

—  /T*'  T'  T'  T'  T'  T'  \ 

The  existence  of  such  a  sequence  is  proved  later.  Finally,  X  can  be  a  sequence  with 
the  same  order  of  X',  but  the  ready  times  and  deadlines  of  the  tasks  are  recovered  to  their 
original  settings.  This  is  iUustrated  in  Fig  4  (iv).  The  figures  give  the  rough  idea  about  how 
the  adjustment  of  task  order  Ccin  be  made  to  satisfy  the  conditions  described  in  the  Leading 
Theorem.  Here  below  is  the  proof  of  the  Leading  Theorem. 


Proof  (of  Leading  Theorem):  We  would  first  show  the  existence  of  L.  The  modification  of 
ready  times  and  deadlines  of  the  tasks  for  L’  is  done  in  such  a  way  that  their  started  times 
are  not  affected.  In  addition,  their  computation  times  remain  the  same.  It  is  clear  that  U 
is  feasible,  and 


We  can  obtain  L'  in  the  following  way.  At  the  first  step,  the  first  task  Tf '  of  L'  is  the  task 
in  L’  such  that,  for  any  task  T,,  belonging  to  L\  T^'  <I  T^.  Such  a  task  Pf'  exists  because  there 
are  no  containing  relations  among  the  modified  tasks,  and  ties  can  be  broken  arbitrarily.  T^' 
is  exchanged  with  the  task  located  just  left  to  it  in  L'.  Continue  the  exchanging  process 
until  T^'  occupies  the  first  location  in  the  sequence.  At  the  second  step,  the  second  task 
T^'  of  L'  is  the  task  in  L'  such  that,  for  any  task  Tx  belonging  to  L'  except  T^\  T^'  <1  Tx- 
Exchange  with  its  left  neighbor  task  consecutively  until  it  occupies  the  second  location 
in  the  sequence.  At  the  ith  step,  the  ith  task  T^'  of  L'  is  the  task  in  U  such  that,  for  any  task 
Tx  belonging  to  U  except  T^'  through  T/''  <  P-.  Exchange  P/"'  with  its  left  neighbor 
task  consecutively  until  it  occupies  the  zth  location  in  the  sequence.  We  keep  performing  this 
operation  until  we  finally  obtain  U.  Insertion  of  T^'',  1  <i  <  \L'\,  into  the  ith  position  of 
the  sequence  b}'  consecutive  swapping  is  possible  because  T^'  <3  Tx  for  aU  Tx  not  belonging  to 
{T^  — Tl^i).  In  a  word,  the  adjustment  is  possible  because  there  are  no  containing  relations 
among  the  modified  tasks,  and  hence  there  exists  a  total  ordering  of  the  modified  tasks  by 
the  Weakly  Leading  Condition.  The  resultant  L'  is  existent  and  is  a  sequence  satisfying  the 
Weakly  Leading  Condition. 

There  is  a  chance  that  {T^,  Ta)  conforms  to  L' .  By  Lemma  3,  all  tasks  located  between  Tp 
and  Pft  must  match  each  other.  Hence  the  order  of  these  tasks  does  not  make  any  difference. 
We  can  thus  exchange  the  position  of  Tp  and  P^,  which  makes  {Tp,  Tq)  conform  to  L' 

During  the  process  of  adjusting  the  position  of  T^' ,  1  <i  <  iX'],  T^'  leads  to  or  matches 
any  task  in  the  sequence  except  {T^' . . .  P/^j).  Thus  we  can  apply  Lemma  4,  to  be  described 
next,  which  assures  that  the  resultant  sequence  after  swapping  Tf''  to  the  ith  location  comes 
with  a  shorter  or  equal  finish  time.  This  explains 

f LpreQL'  —  f LprtQL'  fLprtQL' 


i  is  a  sequence  with  the  same  order  of  X',  but  the  ready  times  and  deadlines  of  the 
tasks  are  recovered  to  their  original  values.  Each  task  in  L  can  be  started  no  later  than  the 
starting  time  of  the  same  task  in  L'.  Consequently, 

^ IjyrcQiii  “  f 

By  Corollary  1,  we  have 


□ 


Lemma  4  Assume  that  Si  ©  <92©  (Tj)  ©  S3  is  feasible,  where  <51,  <92,  and  <93  are  sequences. 
If  Tj  <3  92,  then  fsi@{T,)eS2®sz  ^  fsi®S2®(Tj)®S3- 

PROOF:  We  wiU  prove  the  theorem  by  induction  on  192|.  When  |92|  =  0,  it  is  vacuously 
true.  Assume  that  it  is  true  when  192|  =  k.  We  would  Hke  to  show  that  it  is  true  when 
|92|  =  k-rl.  Let  92  =  (Ti)  ©  92',  where  |92'|  =  fc;  i.e., 

91  ©  92  ©  (Tj)  ©  93  =  91  ©  {Ti)  ©  92'  ©  (Tj)  ©  93. 

We  can  view  91  ©  {Ti)  as  a  single  sequence,  and  because  |92'|  =  k,  by  induction  h3q>oth- 
esis,  we  have 

/sie{3v>e{Tj)©S2'@S3  <  fsie{Ti)es2'Q{Tj)®s3- 
By  definition, 

fsie{Ti)e{Ti)  =  rnax{max{fs,ri)  +  Ci,rj)  +  Cj 

.  =  rnax{fs  +  Ci  +  ':j,ri  +  Ci  +  Cj,rj-j-Cj) 

Since  Tj  <  92,  which  indicates  that  Tj  <3  Ti,  we  have  rj  <  ri,  and  Vj  ©  Cj  <  ri  ©  c.-  +  Cj. 
fsie(Ti)e(Ti)  =  max(fs  ©  c;  ©  Cj,  ri  ©  c,-  ©  Cj) 


On  the  other  haoid, 


fsie{Tj)e{Ti)  =  max{max{fs,rj)  + cj,ri)  +  ci 

=  Tnax{fs  +  Ci  +  Cj,rj  +  Ci  +  Cj,ri-\-Ci) 

Because  rj  +  c,-  +  Cj  <  ri  +  c,-  +  Cj,  we  have 

fsi®{Ti)®{Ti)  <  fsie(Ti)®(Tj}- 

By  CoroUax}’’  1, 

/5l®(Tj)©(T;)©52'©S3  ^  /si©(r,)®(Tj)©52'®S3- 
Therefore, 

fsie{Tj)eS2esz  ^  fsi®S2e{Tj)eS3- 

□ 


3.3  Dominance  Theorem 

The  super  sequence  is  constructed  for  the  feasibility  test  of  a  task  set.  If  a  task  set  is  feasible, 
we  say  that  there  exists  a  full  schedule  of  the  task  set.  There  may  exist  more  than  one  full 
schedule  for  a  given  task  set.  An  optimal  full  schedule  is  a  full  schedule  whose  finish  time  is 
shortest  among  all  the  possible  full  schedules.  Note  that  a  full  schedule  is  a  feasible  instance. 
In  this  section,  we  prove  that  if  a  task  set  is  feasible,  there  exists  an  optimal  full  schedule 
conforming  to  the  super  sequence  “  Hence,  the  super  sequence  provides  a  valid  and  pruned 
search  space  for  deriving  the  optimal  full  schedule  of  a  task  set. 

’In  [2],  Erschler  ei  al.  ’s  theorem  implied  a  similar  result;  if  a  task  set  is  feasible,  there  exists  a  full  schedule 
in  the  dominant  set.  Our  theorem  further  shows  that  there  exists  such  a  full  schedule,  with  the  minimum 
finish  lime  among  all  full  schedules,  that  conforms  to  the  super  sequence.  We  prove  the  existence  of  such  an 
optimal  full  schedule  in  a  more  systematic  way. 


Theorem  3  Assume  that  the  task  set  F  is  feasible  and  p  is  an  optimal  full  schedule  of  F. 
Let  Ta  and  T/j  be  two  top  tasks  of  p  such  that  Ta  -<  Tp.  If  {Tp,Tq)  conforms  to  p,  then  there 
exists  another  optimal  full  schedule  p'  such  that  {Ta,  Tp)  conforms  to  p'. 

Proof:  Ta  and  Tp  axe  two  top  tasks.  Let  F’  be  a  frame  such  that  bp  =  sp  and  ep  =  fa- 
Ta  -<  Tp  means  bp  =  sp  >  rp  >  r^,  and  ep  =  fa  ^  da-  K  there  exists  a  task  Tx  such 
that  F  U  Tx,  then  Ta  U  Tl  too.  This  contradicts  to  the  fact  that  Ta  is  a  top  task.  Hence 
F  cannot  contain  any  task.  By  the  Leading  Theorem,  there  exists  another  sequence  p'  such 
that  {Ta,Tp)  conforms  to  p',  and  both  \p'\  =  |p|  and  fpi  <  fp  hold,  which  means  p'  is  an 
optimal  full  schedule  too.  □ 

When  two  tasks  match  each  other,  it  dose  not  matter  which  task  is  executed  first.  This 
gives  rise  to  the  following  Corollary. 

Corollary  2  Assume  that  the  task  set  F  is  feasible  and  p  is  an  optimal  full  schedule  of  F. 
Also  assume  that  {hi,...,Tp,  Ta,...,}H),  the  subsequence  of  the  top  tasks  in  p,  conforms  to  p. 
If  Tk  <  Tp,  then  there  exists  another  optimal  full  schediile  p'  such  that  {hi,. ,  Tq,  Tp, 
conforms  to  p'. 

Proof:  Theorem  3  holds  when  Ta  <  Tp,  because  when  two  tasks  match  each  other,  the 
execution  order  of  the  two  tasks  is  arbitrary.  Also  b}^  looking  at  the  adjustment  process  of 
Leading  Theorem,  we  can  find  that  the  tasks  located  before  and  after  Ta  and  Tp  have  not 
been  adjusted.  This  verifies  the  corollary.  □ 


Corollary  3  Let  H  =  {hi,  hi, . . . ,  ht)  be  top  tasks  of  the  task  set  F  such  that  hi  <  hi  < 
...<  hi.  If  F  is  feasible,  there  exists  an  optimal  full  schedule  p'  to  which  H  conforms. 

Proof:  Since  F  is  feasible,  there  exists  an  optimal  full  schedule  p.  Let  K  =  {ki,ki, . . .  ,ht) 
be  a  sequence  which  is  a  permutation  of  H  such  that  K  conforms  to  p.  We  woiild  like  to 
adjust  the  order  of  the  tasks  in  K  so  that  K  is  transformed  successively  into  H.  We  locate 
the  corresponding  task  of  hx  in  K,  where  x  is  chosen  in  the  order  of  1  through  i,  and  adjust 


it  to  the  ith  position  in  K  by  consecutively  swapping  h~  with  its  left  neighbor.  This  leads 
to  the  sequence  H.  During  the  swapping,  always  weakly  leads  to  its  left  neighbor,  for 
/ij, . . . ,  hx-\  axe  in  positions  1, . . . ,  x  —  1.  By  Corollary  2,  there  always  exists  an  optimal  full 
schedule  to  which  the  intermediate  resultant  sequences  conform.  Therefore,  there  exists  an 
optimal  full  schedule  p'  to  which  (hi, hj, ..  .,ht)  conforms.  □ 

Given  an  optimal  full  schedule  p,  we  can  always  obtain  another  optimal  full  schediile  p' 
in  which  the  top  tasks  are  ordered  according  to  the  weakly  leading  relations  by  Corollary  3. 
Therefore  the  rule  Rl  is  verified. 

Before  we  can  go  further,  the  following  definitions  are  useful.  Let  be  a  top  task  and 
Tx  a  nontop  task  of  a  sequence  5.  We  say  that  {hk,Tx)  is  a  disorder  pair  of  S  if  {hk,Tx) 
conforms  to  S  and  T-  -<  hk-  Similarly,  {Tx,  hk)  is  a  disorder  pair  of  S  if  {Tx,  hk)  conforms  to 
S  and  hk  -<  Tx.  The  disorder  degree  of  S  is  defined  to  be  the  number  of  disorder  pairs  in  S. 

Theorem  4  Assume  that  the  task  set  F  is  feasible  and  p  is  an  optimal  full  schedule  of  F. 
Let  hi  <3  ^2 < . . . <  ht  be  top  tasks  and  Tx  a  nontop  task  of  p.  Assmne  that  p  =  Lpre®LQLpost 
such  that 

(hi, ... ,  hk-i)  conforms  to  Lpr^ 

....  hj^  conforms  to 

We  have  the  foDowing  properties: 

(1)  if  Tx  -<  hk  and  L  =  (hfc,...,TL),  then  there  exists 
Lprt  ©  X  ©  Lpost  such  that  X  is  a  permutation  of  L,  and 
disorder  degree  of  p'  is  less  than  that  of  p 

(2)  if  hk  -<  Tx  and  L  =  (7k,..., h*),  then  there  exists 
Lprt  ©  X  ©  Lpogi  such  that  Lis  a.  permutation  of  L,  and 
disorder  degree  of  p'  is  less  than  that  of  p 

Proof:  We  will  prove  (1)  first.  Let  7^  be  a  frame  v/ith  bp  =  and  ep  =  /-.  Since  7k  ■<  hk, 
cp  =  fx  <  dx  <  dk^.  .A.lso  bp  =  Sh^  >  rk^.  If  there  exists  a  task  Tk,,-  located  between  h*  and 


another  optimal  full  schedule  p'  = 
(7k,  hk)  conforms  to  X;  besides,  the 

another  optimal  full  schedule  p'  = 
{hk,Tx)  conforms  to  X;  besides,  the 


Tx  such  that  F  U  7^,,.,  then  L)  Tyj^  too.  This  contradicts  to  the  fact  that  hk  is  a  top  task. 
The  condition  of  the  Leading  Theorem  is  satisfied.  Hence  there  exists  a  sequence  L  which 
is  a  permutation  of  L  such  that  {Tx,hk)  conforms  to  X  and  <  fp.  Therefore, 

Fprc  0X0  Lpost  is  also  an  optimal  full  schedule.  Now  let  us  look  at  Figure  4(iv).  This  is 
the  schedule  after  the  adjustment  process  of  the  Leading  Theorem  is  made.  For  the  tasks 
whose  deadlines  are  less  than  ep,  they  all  lead  to  hk-  Note  that  the  disorder  is  a  relationship 
defined  between  a  nontop  task  and  a  top  task,  and  hk  is  the  only  top  task  in  the  frame 
F .  Therefore,  no  new  disorder  pairs  with  hk  are  introduced  among  these  tasks.  Similarly, 
for  the  tasks  whose  ready  times  are  greater  than  bp,  they  axe  all  led  by  hk-  Therefore,  no 
new  disorder  pairs  are  introduced.  As  for  the  tasks  otherwise,  including  Tx  and  hk,  whose 
deadlines  are  greater  than  or  equal  to  ep  and  ready  times  less  than  or  equal  to  bp,  they  can 
be  ordered  arbitrarily.  Hence,  we  can  position  T*  before  kk,  and  remove  the  disorder  pairs, 
if  any,  in  these  tasks  by  rearranging  the  proper  orders  for  them.  Thus  the  disorder  degree  of 
L  is  decremented  by  at  least  one.  So  the  disorder  degree  of  p'  is  less  than  that  of  p.  Property 
(2)  holds  for  the  same  reason.  □ 

Note  that  Tx  does  not  match  hk  or  hk+i;  otherwise  Tx  is  also  a  top  task,  which  contradicts 
our  assumption. 

Theorem  5  Assume  that  the  task  set  T  is  feasible  and  p  is  an  optimal  full  schedule  of  T. 
Let  hi  <  h2  <3  . . .  <  ht  he  top  tasks  of  p.  There  exists  an  optimal  full  schedule  p'  such  that 
(hi,  ^2,  ...,ht)  conforms  to  p',  and  for  any  nontop  task  Tx  such  that  {hk,  Tx,  h^+i)  conforms 
to  p',  either  Tx  U  hk  or  Tx  LI  h^+i. 

Proof;  Assume  that  T*  is  a  nontop  task  such  that  {hk,Tx,hk+i)  conforms  to  p’.  If  Tx  does 
not  contain  hk  and  Tx  does  not  contain  hk-^-i,  then  either  Tx  -K  hk  or  hk-i-i  -<  Tx-  Hence, 
either  (hk,Tx)  or  {Tx,  hk+i)  is  a  disorder  pair.  We  can  eliminate  it  through  Theorem  4,  and 
the  disorder  degree  is  decremented  by  at  least  one.  Whenever  there  is  a  disorder  pair  in  the 
schedule,  we  can  always  apply  Theorem  4  to  eliminate  it.  The  disorder  degree  is  decremented 
in  this  way  imtil  finally  reaching  zero.  Hence,  {hk,  Tx,  hk-^-i)  conforming  to  p'  implies  that  Tx 
is  not  leading  to  hk  and  hk-^-i  is  not  leading  to  Tx-  The  only  possibilities  are  either  Tx  L)  hk 


or  Tx  U  Ai+i . 


□ 


Theorem  5  confirms  the  validity  of  rules  Rl  and  R2. 

Theorem  6  (Dominance  Theorem)  If  a  task  set  F  is  feasible,  there  exists  an  optimal 
full  schedule  p  such  that  p  conforms  to  the  super  sequence  of  F. 

Proof;  In  Theorem  5,  we  verify  the  existence  of  the  optimal  full  schedule  such  that  the 
top  tasks  are  ordered  according  to  their  weakly  leading  relations,  and  the  nontop  tasks 
axe  located  in  the  appropriate  subsets  between  top  tasks.  The  only  work  left  is  to  order 
the  nontop  tasks  in  each  subset.  The  adjustment  process  of  the  Leading  Theorem  can  be 
applied,  and  the  resultant  order  is  exactly  specified  by  rule  R3.  So  we  can  conclude  that 
there  exists  an  optimal  full  schedule  p  which  conforms  to  the  super  sequence.  □ 

3.4  Conformation  Theorem 

If  there  is  no  task  rejected  in  p,  there  exists  an  optimal  full  schedule  conforming  to  the 
super  sequence  of  F.  However,  if  F  is  not  feasible,  some  tasks  in  F  should  be  rejected.  The 
dominant  rules  are  developed  based  on  the  assumption  that  no  task  is  rejected.  When  tasks 
axe  allowed  to  be  rejected,  the  situation  is  different.  The  issue  to  be  raised  is  whether  the 
decent  solution  for  feasibility  test  can  be  applied  to  oux  optimization  problem.  Remember 
that  by  optimization  we  mean  that  the  number  of  rejected  tasks  in  the  schedule  is  minimized 
and  then  the  finish  time  of  the  schedule  is  also  minimized.  \^Tien  a  task  set  is  feasible,  the 
optimal  schedule  is  also  the  optimal  full  schedrile.  The  difficulties  axe  axldressed  in  the  next 
section,  followed  by  the  approach  and  proof  to  solving  the  difficulties. 

3.4.1.  Difficulties 

We  wish  to  make  use  of  the  super  sequence  as  search  space  in  our  scheduling  problem.  The 
difficulties  axe  twofold. 

First,  when  a  task  is  allowed  to  be  rejected,  the  dominant  rules  specifying  the  relations 
among  containing  tasks  and  contained  tasks  need  to  be  modified,  because  the  rules  axe 


Figure  5:  The  optimal  schedule  may  uot  conform  to  the 


super  sequence. 


developed  based  on  the  assumption  that  no  task  is  to  be  rejected.  The  new  rules  can 

become  quite  complicated.  Let’s  look  at  the  example  depicted  in  Figure  5.  Assume  that  the 
t2Lsk  set  is 

r  =  {ra.2’2,T3.r„rs). 

and  the  super  sequence  of  the  task  set  is 


A  =  (ra,r2,T3,T4,r2,r3,2i,T5). 


The  top  tasks  axe  typed  in  bold  letters  for  emphasis.  F  is  not  feasible.  We  can  see  that 
one  possibility  of  the  optimal  schedule  could  be 

ho  =  {T2,Ti,Tz,Ts). 

Apparently,  po  does  not  conform  to  A.  One  may  be  able  to  show  that  another  optimal 
schedule  {T2,T4,T3,T5)  conforms  to  A.  However,  given  an  arbitrary  task  set,  it  is  not 
guaranteed  that  one  is  always  able  to  do  so.  In  the  example,  T4  is  rejected.  If  we  recompute 
the  super  sequence  without  r4,  we  would  get  a  different  super  sequence.  The  new  super 
sequence  would  be 


Ao  =  (Ti,T2,T:,T3,rj,T5), 

to  which  Po  conforms.  This  gives  a  great  difficulty.  It  seems  that  we  need  to  check 
against  each  task.  Construct  a  super  sequence  in  condition  that  the  task  is  accepted,  and 


construct  another  super  sequence  in  condition  that  the  task  is  rejected.  In  general,  we  need 
to  construct  2”  super  sequences  in  this  way.  This  is  too  formidable  to  schedule,  considering 
that  the  number  of  instances  in  each  individual  super  sequence  can  be  exponential  to  the 
size  of  the  task  set. 

Secondly,  while  rejecting  a  nontop  tasks  does  not  alFect  much,  rejecting  a  top  task  could 
affect  the  duplication  and  positions  of  the  nontop  tasks  or  might  even  result  in  some  new 
top  tasks.  Thus,  the  super  sequences  can  be  totally  different.  Look  at  the  same  example  in 
Figure  5.  The  rejection  of  T4  results  in  two  more  top  tasks,  i.e.,  T2  and  T3,.  This  makes  Ao 
completely  different  from  A. 

We  propose  a  swapping  and  replacing  procedure  to  overcome  the  difficulties.  The  proce¬ 
dure  would  be  described  and  verified  in  the  following  section. 

3.4.2  Approach  and  Proof 

The  idea  of  our  approach  is  stated  briefly  below,  followed  a  formal  proof.  Let  T  and  A  be  the 
original  tcisk  set,  and  the  super  sequence  of  the  task  set  respectively.  It  is  clear  that  there 
exists  an  optimal  schedule,  which  is  unknown  to  us,  for  any  task  set.  Let  To  be  the  task 
set  which  is  composed  of  the  tasks  of  the  unknown  optimal  schedule,  and  Ao  be  the  super 
sequence  of  To-  To  and  Ao  axe  also  unknown  to  us.  As  mentioned  above,  Ao  might  be  quite 
different  from  A.  Notice  that  the  unknown  optimal  schedule  of  T  is  also  an  optimal  (full) 
schedule  of  To-  Since  Fo  is  feasible,  by  the  Dominance  Theorem,  there  exists  an  optimal  full 
schedule  for  Fo,  say  po,  such  that  po  conforms  to  Aq.  Out  problem  is  that  we  are  not  able  to 
compute  Po  from  Ao,  because  Ao  is  unknown.  We  axe  able  to  compute  A  from  F  by  applying 
the  dominant  rules.  The  swapping  and  replacing  procedure  exploits  the  way  to  adjust  the 
order  of  tasks  in  po  and  to  replace  some  tasks  if  necessary,  so  as  to  transform  po  into  a  new 
schedule  p  such  that  p  is  also  an  optimal  schedule  and  best  of  all  p  conforms  to  an  instance 
of  A.  For  the  sake  of  simplicity,  we  will  say  a  schedule  conforms  to  A,  when  the  schedule 
conforms  to  an  instance  of  A.  So  we  can  use  A  as  a  valid  search  space  when  scheduling  F. 
In  the  example  of  Figure  5,  we  transform  po  into 


p={T2,T,,n,Ts). 


TMs  example  is  so  simplified  that  the  existence  of  p  can  be  verified  by  mere  intuition. 
However,  the  reasoning  is  fax  more  complicated  than  it  appears  at  the  first  glance.  We  axe 
going  to  prove  in  the  following  theorem  that  such  an  optimal  schedule  p  that  conforms  to  A 
always  exists.  The  corresponding  lemmas  axe  presented  in  the  next  section. 

Theorem  7  (Conformation  Theorem)  Given  a  task  set  T  =  {Tr,T2,...,  r„},  there  exists 
an  optimal  schedule  p  such  that  p  conforms  to  the  super  sequence  A  of  F. 

Proof.  Given  any  task  set  F,  there  exists  at  least  one  optimal  schedule,  which  is  tmknown 
to  us.  Assume  that  we  need  to  reject  w  tasks  from  F  to  make  a  feasible  schedule.  Let  Fo  be 
the  task  set  which  is  composed  of  the  tasks  in  the  unlcnown  optimal  schedule.  Fo  is  a  subset 
of  F.  The  super  sequence  of  Fq  is  denoted  by  Aq.  In  addition,  we  use  F,-,  0  <  j  <  w,  to 
represent  a  task  set  derived  by  adding  j  tasks  into  Fo,  Ay  the  super  sequence  of  Fy,  and  py 
an  optimal  schedule  of  Fy.  When  we  say  adding  j  tasks  into  Fq,  we  mean  that  the  resultant 
task  set  Fy  is  composed  of  distinct  tasks  and  Fy  is  a  subset  of  F.  In  particular,  F^,  is  F. 
We  will  prove  by  induction  on  w  to  show  that  there  exists  an  optimal  schedule  p^  for  F 
conforming  to  A^,. 

Base  step  tr  =  0:  there  is  no  task  rejected.  F  =  Fq.  Since  F  is  feasible,  by  the  Dominance 
Theorem,  there  exists  an  optimal  (full)  schedule  po  for  F  such  that  po  conforms  to  Ao* 
Induction  hypothesis:  assume  that  the  theorem  holds  when  w  =  j,  i.e.,  lpo|  =  n  —  j.  For 
the  task  set  Fy  which  is  derived  by  adding  j  tasks  into  Fo,  there  exists  an  optimal  schedule 
pj  for  F  such  that  py  conforms  to  Ay.  Notice  that  |py|  =  |po|,  and  |Fy|  =  jFol  +  j- 

Now  consider  the  case  when  tc  =  j  +  1,  i.e.,  |po|  =  n  -  (j  +  1).  We  need  to  reject  ;  +  1 
tasks  to  make  a  feasible  schedule.  There  exists  an  optimal  schedule  py  for  F  conforming  to 
Ay  by  induction  h3q)othesis.  We  want  to  show  that,  by  swapping  and  replacing  the  tasks 
in  py,  the  resultant  sequence  py+i  conforms  to  Ay+i;  besides,  lpy+i|  =  lpy|,  and  <  f^., 
which  implies  that  py+i  is  also  an  optimal  schedule  for  F.  Let  be  the  task  added  into  Fy 
to  make  Fy+a.  So,FyU{2;}  =  Ti+i  •  There  axe  two  possibilities  when  adding 

If  Tx  is  a  nontop  task  of  Fy+i,  adding  Tx  does  not  add  a  top  task  into  Fy.  The  orders 
oi  the  top  tasks  in  both  Ay+i  and  Ay  derived  through  rule  Rl  are  exactly  the  same.  Rule 
R2  specifies  the  relation  between  a  nontop  task  and  a  top  task.  Adding  a  nontop  task  Tx 


does  not  affect  the  relations  between  the  already  existent  nontop  tasks  and  top  tasks.  The 
positions  (duplicates)  of  the  already  existent  nontop  tasks  in  Aj  axe  preserved  in  Aj+i.  Rule 
R3  specifies  how  to  arrange  the  order  of  the  nontop  tasks  within  each  subset.  Again  adding 
a  nontop  task  Tx  does  not  alter  the  orders  of  the  already  existent  nontop  tasks  in  each  subset 
in  Aj.  Therefore,  if  the  task  being  added  is  a  nontop  task,  Aj  is  a  subsequence  of  Aj+j.  Let 
us  look  at  the  example  in  Figure  5.  Assume  that  Fj  and  Fj+i  axe 

Fj  =  {T2,r3,r4,r5},  and 

Fj+i  =  {^l,^25^3^T4,T5}, 

where  Ti  is  a  nontop  task.  The  corresponding  super  sequences  would  be 

Aj  =  (T2,T3,T4,r2,r3,T5),  and 

^i+i  =  (ra,r2,r3,T4,r2,T3,ra,T5). 

We  can  see  in  the  example  how  Aj  conforms  to  Aj+i. 

Otherwise,  T;  is  a  top  task  of  Fj+j.  does  not  contain  other  tasks  in  Fj+j.  Two 
situations  axe  possible. 

(i)  Tx  is  not  contained  by  other  tasks.  The  number  of  top  tasks  in  Fj+i  is  one  more  than 
that  of  the  top  tasks  in  Fj.  The  order  of  the  top  tasks  in  Aj  is  preserved  in  Aj+i,  since 
the  relations  of  the  top  tasks  axe  not  altered  b}'  ad.ding  Tx-  Furthermore,  IT-  does  not  alter 
any  existent  orders  among  the  nontop  tasks  and  top  tasks,  or  among  the  orders  between  the 
nontop  tasks  and  nontop  tasks,  specified  by  rules  R2  and  R.3,  respectively.  Therefore,  Aj  is 
a  subsequence  of  Aj+i.  Let  us  look  at  the  example  in  Figure  6.  Assume  that  Fj  and  Fj+i 
axe 

Fj  =  {Ti,r2,r4,T5},  and 

Tj+l  =  {Ti,T2,T3,T4,Ts], 

where  T3  is  a  top  task  not  contained  by  other  tasks.  The  corresponding  super  sequences 
would  be 


Aj  =  (ra,T2,Ti,r4,T5,r4),  and 


Figure  6.  Xlie  added  top  task  T3  is  not  contained  by  otber  tasks. 


“  (Fi,T25Ti,!r3,!r4, 


We  can  see  in  the  example  how  conforms  to  A^u-i . 

(11)  Tx  IS  contained  by  some  top  tasks  and/or  nontop  tasks  of  Tj.  Let  the  top  tasks  of 
Tj  containing  T*  be  indexed  in  the  weakly  leading  order.  This  situation  is  more 

complicated,  because  gi,i  =  1, . . . ,  m,  turn  out  to  be  nontop  tasks  in  Fj+i.  There  exists  a 
total  ordering  of  them  by  weakly  leading  relations,  because  there  is  no  containing  relations 
2^ong  Qi.  By  rule  Rl,  the  super  sequence  of  Tj.).!  can  be  expressed  as 

(•  •  ■  >  ^k—\i  - .  .  ,  .  .  . ,  .  . .  ,  hfc, . . .  . . . ,  •  •  - ,  . . .), 

where  hi, ,  hi_i,  h*,  h^+i, . . .  are  the  top  tasks  in  Tj+i,  and  in  particular,  hk  represents 
Tx-  By  rule  R3,  the  super  sequence  of  Tj.;.!  can  also  be  expressed  as 

Aj+i  =  (...,  -5A,fc+i,  hfc+i, . . .),  (6) 

. _ _ 

I  II 

where  Clj+i  represents  the  subsequence  of  A^+i  between  ht_i  and  h;t+i,  excluding  hk-i 
and  kk+i,  as  depicted  above,  and  Q,+i  =  ©  means  concatenation 

of  sequences.  Remember  that  gi,...,g,n  axe  top  tasks  of  T^.  All  the  top  tasks  in  Tj  are  in 
the  order  of  hi, ... ,  h*:_i,5i, . . .  ,gm,hk+i, ...  by  the  weakly  leading  relations.  By  rule  Rl, 
the  super  sequence  of  Tj  can  be  expressed  as 


(•  •  •  5  h<;_l  , 


(7) 


where  Oj  represents  the  subsequence  of  A,  between  hk-i  and  hfc+i,  excluding  hk-i  and 
hk+i,  as  depicted  above.  Notice  that  in  Equations  6  and  7,  the  subsequences  before  h^-i 
of  both  Aj+i  and  Aj  are  exactly  the  same,  because  the  addition  of  the  top  task  hk,  or  Tx, 
affects  only  the  subsequence  between  hk-i  and  hk+j-  Similarly,  the  subsequences  after  hk+i 
of  both  Aj+i  and  Aj  are  exactly  the  same  too.  Hence  an  instance  of  Aj+i  will  differ  from 
an  instance  of  Aj  only  in  the  the  subsequences  of  flj+i  and  flj. 

Now  we  would  hke  to  check  what  tasks  in  Clj  should  follow  immediately  after  hk-i-  By 
Lemma  7,  all  the  tasks  in  flj  can  be  found  in  flj+j  —  hk-  So  we  only  need  to  check  the  tasks 
in  Hj+i  —  kk-  If  a  task  contains  hk-i  but  not  hk,  the  task  must  not  contain  gi.  Because 
gi  contains  hk,  any  task  which  contains  gi  should  also  contain  hk.  \^nhen  constructing  the 
subsequence  of  Aj  between  hk-i  and  gi,  by  rule  R2,  all  the  tasks  which  appear  in 
Aj+i  should  follow  immediately  after  hk-x',  and  by  rule  R3,  the  order  of  the  subsequence  is 
exactly  the  same  as  j- 

One  may  observe  that  some  tasks  in  Bk-x,k  contain  hk-i  but  do  not  contain  ^i,  so  they 
would  also  be  positioned  between  hk—x  and  ^i.  These  tasks  would  follow  after  the  tasks  of 
^k-xk-  because  they  do  contain  hk  and  hence  have  greater  deadlines  than  those 

tasks  in  x-  same  reason,  all  the  tasks  which  appear  in  Bx  of  Aj+i  should  be 

located  immediately  before  A;t+i  when  constructing  Aj,  and  the  order  is  the  same.  Hence, 
the  Aj  can  be  further  expressed  as 


A j  —  (.  .  .  ,  hk-x ,  ^k—x,k  ’ ;  •  •  ^9x,  •  »  •  ;  9m.,’  •  •,  ,  hk+x ,  — ) , 


(8) 


w^here  fl'-  represents  the  subsequence  between  Bj^_.^  j  and  Bj  of  Aj,  excluding 
and  Bx  jt+i-  We  have  flj  =  B^_y  j  ©  Cl'j  ©  Bj  ^^i .  By  Lemma  9,  all  the  tasks  in  fl'-  are  either 
in  Bk-x,k  or  in  B^n.^- 

Let  us  look  at  an  example  in  Figure  7.  The  task  set  in  the  figure  is  Tj+i.  And  Fj+i  —  hk 
would  be  Fj.  gx  and  p2  contain  only  hk-  So  gx  and  g2  are  classified  as  nontop  tasks  in  Fj+i, 


and  as  top  tasks  in  Fj.  We  can  compute  the  super  sequences  ais  follows. 


Aj+I  (2^1?^25?3,r4,hij;-l5  ^4  ,  722^3^2^5  ^1,^2?  2^2,  Pi  5^25  X3>  2^1  ^  T5  ,  hjc+l ,  2i,  Ts), 


Br, 


Now  going  back  to  examine  Equations  6  and  8,  we  can  find  that  and  Aj^i  only  differ  in 
the  middle  subsequences  represented  by  fl'-  and  .  This  can  also  be  seen  in  the  example  in 
Figure  7.  The  instances  extracted  from  Aj  would  conform  to  Aj^i  except  the  corresponding 
middle  subsequence  mentioned  above.  Remember  that  pj  conforms  to  Aj.  We  would  try  to 
adjust  the  order  of  the  tasks  of  the  subsequence  in  pj  which  correspond  to  flj  for  the  purpose 
that  the  resultant  schedule  Pj^^i  conforms  to  Aj+i  and  pjj^^  is  also  an  optimal  schedule  of 
F.  The  adjustment  procedure,  called  the  swapping  and  replacing  method,  applied  to  pj  is 
described  below: 


Cl:  for  all  tasks  Ty  €  such  that  fy  <  d},^,  they  are  sorted  by  their  ready  times  Ty. 

C2:  for  all  tasks  Ty  €  such  that  Sy  >  r/^^,  they  are  sorted  by  their  deadlines  dy. 

CZ:  a  task  cam  be  sorted  by  Cl  or  C2  described  above  if  the  task  satisfies  both  conditions. 
C4:  if  there  exists  a  task  Ty  €  fl'-  such  that  Sy  <  and  fy  >  4*,  Ty  is  replaced  by  4. 

We  would  like  to  show  that  the  adjustment  does  make  pj+i  conform  to  Aj+i.  Remember 
that  Aj  and  Aj+i  only  differ  in  the  middle  subsequences  represented  by  fl'-  and  We 

only  swap  and  replace  the  tasks  of  pj  located  in  Q,j  to  derive  Pj+i-  Since  pj  conforms  to 
Aj,  the  head  and  the  tail  of  pj+i  also  conforms  to  Aj+i-  So  we  only  need  to  check  the 
middle  subsequence  of  pj+i  to  see  if  the  whole  sequence  of  pj+i  conforms  to  Aj+j.  The 
tasks  axljusted  by  the  swapping  and  replacing  procedure  axe  either  in  Bk-i,k  or  in 
by  Lemma  9.  Let  us  first  check  the  adjustment  of  Cl.  In  Aj+i,  the  order  of  the  tasks  in 
Bk-i,k  be  determined  arbitrarily  according  to  rule  R3,  so  it  does  not  matter  which  task 
is  located  before  which.  And  in  Aj+i,  the  order  of  the  tasks  in  is  determined  by  their 

ready  times.  During  the  adjustment  of  Cl,  all  the  qualifying  tasks  are  ordered  according  to 
their  ready  times.  We  know  that  the  ready  times  of  the  tasks  in  Bk-\,k  are  less  than  the 
ready  times  of  the  tasks  in  Bjzi^k-  resultant  schedule  Pj+i,  the  tasks  of  Bk-i,k 

are  positioned  before  the  tasks  of  Bjzi,k-  indicates  that  the  order  of  these  tasks  in 

pj+i  after  the  adjustment  of  Cl  conforms  to  Aj+i.  For  the  same  reason,  the  adjustment  of 
C2  makes  the  order  of  the  swapped  tasks  conform  to  Aj+i.  In  condition  C4,  if  such  a  Ty 
exists,  replacing  Ty  by  hk  also  conforms  to  Aj+i,  which  can  be  seen  in  Equation  6.  Each 
task  Ty  €  rij  satisfies  one  of  the  conditions  by  Lemma  11.  Hence,  all  the  tasks  in  the  middle 
subsequence  of  pj+i  are  adjusted  in  such  a  way  that  the  order  conforms  to  Aj+i.  So  pj+i 
conforms  to  Aj+i. 

Now  we  would  like  to  show  that  Pj+i,  in  addition  to  conforming  to  Aj+i,  can  also  be 
finished  no  later  than  pj.  We  can  view  the  tasks  satisfying  condition  Cl  as  having  the  same 
virtual  deadlines  of  4*?  because  they  all  nnish  before  this  time  instant.  Hence,  there  is  a 
total  ordering  among  the  tasks  with  virtual  deadlines  by  weakly  leading  relations,  which 
is  achieved  by  sorting  their  ready  times.  By  Lemma  10,  the  finish  time  of  the  resultant 


schedule  after  the  adjustment  of  Cl  would  not  be  greater  than  that  of  the  originaJ  schedule. 
Similarly,  the  tasks  satisfying  condition  C2  can  be  viewed  as  having  the  same  virtual  reariy 
times  of  because  they  all  start  after  this  time  instant.  For  the  same  reason,  the  finish 
time  of  the  resultant  schedule  after  the  adjustment  of  C2  would  not  be  greater  than  that  of 
the  original  schedule.  In  condition  C3,  the  qualifying  tasks  can  be  sorted  in  either  way  and 
does  not  affect  the  result.  In  condition  C4,  if  there  exists  a  task  Ty  whose  computation  time 
covers  the  whole  window  of  the  rejected  task  hk,  we  may  as  well  replace  Ty  by  hk,  and  the 
finish  time  of  the  resultant  schedule  after  the  adjustment  of  C4  would  not  be  greater  than 
that  of  the  original  schedule.  Each  task  Ty  €  flj-  satisfies  one  of  the  conditions-by  Lemma  11. 
Hence,  all  the  tasks  in  the  middle  subsequence  of  are  adjusted  in  such  a  way  that  pj+i 
would  be  finished  no  later  than  pj.  Therefore,  Ipj+ij  =  and  <  fp^.  Since  pj  is  an 
optimal  schedule  of  F,  pj+i  is  also  an  optimal  schedule  of  F. 

How  the  adjustment  procedure  makes  the  finish  time  shorter  is  illustrated  by  Figures  8. 
In  Figure  8(i),  both  Ty  and  Tz  satisfy  condition  C3,  and  Tj  satisfies  condition  Cl.  The 
procedure  of  Cl  is  applied  to  all  these  three  tasks  and  makes  the  finish  time  shorter.  The 
dotted  task  window  frame  in  the  figure  indicates  that  hk  is  a  rejected  task,  in  Figure  8(ii), 
Cl  is  applied  to  the  qualifying  tasks  Ty  and  Tj.  And  by  C4,  Tz  is  replaced  by  hk.  This 
makes  the  finish  time  shorter.  While  it  is  hk  that  is  rejected  before  the  adjustment,  it  tmrns 
out  that  Tz,  whose  computation  time  covers  the  whole  window  of  hk,  is  rejected  after  the 
adjustment. 

So  far,  we  have  shown  that  pj^y  conforms  to  Aj+i,  and  that  pj+y  is  also  an  optimal 
schedule  of  F.  The  theorem  is  thus  verified  by  the  induction.  It  deserves  our  attention  that 
we  do  not  really  apply  the  swapping  and  replacing  procedure  to  any  schedule.  We  just  want 
to  show  the  existence  of  the  optimal  schedule  which  is  pj^y  in  the  context.  To  make  it  clear, 
the  structure  of  the  theorem  is  illustrated  in  Figure  9.  □ 


3.4.3  Corresponding  Lemmas 

The  lemmas  used  by  the  conformation  theorem  are  demonstrated  as  follows. 
Lemma  5  j  n  Bk-y,k  =  Bk-y,k  n  j  n  B*^  =  0 


FigTire  8;  The  swapping  and  replacing  procedure  Cl,  C3  and  C4. 

Bk,k-¥\  —  ^k,k+i  n  =  ^k,k^  ^k,k+i  ~  ®- 

PROOF:  We  first  show  that  j  n  Bk-i.k  =  0-  Given  any  task  Ty  €  Bf._.^j,  Ty  does  not 
contain  hk  by  definition.  Hence,  Ty  does  not  belong  to  Bk-i,k-  So  Bi^_y  j  H  Bk-i,k  —  0-  The 
others  can  be  proved  similarly.  O 

Lemma  6  Bk-i,k  U  Bjri^k  =  ^k,k+i ^kMi  ■ 

Proof:  We  first  prove  that  if  a  task  Ty  €  Bk-i,k  U  Bjzi^k'>  ^  G  Bk,k+i- 

Because  Ty  contains  hk  by  definition,  Ty  must  have  a  location  after  hk  too  by  rule  R2. 
Ty  €  B^;^  U  Bk,k-rT.  U  Ty  does  not  belong  to  so  Ty  €  Bj^j^  U  Bk,k+i-  We 

can  prove  similarly  that  if  a  task  Ty  €  Bf.j:^  U  Bk,k-i-i-,  then  Ty  €  Bk-i,k  U  Bjzix 
Bk-i,k  U  .Bprr,*  =  U  BkMi  •  ° 
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Figure  9:  The  Structure  of  the  Conformation  Theorem 

Lemma  7  Hand  only  if  T,  €  then  T,  € 

PROOF:  We  will  first  prove  the  "iF  part.  By  Lemma 6,  U ^  So 

we  only  need  to  check  against  If  Ty  €  j  or  Ty  £  Bk-i,k, 

then  i;  has  a  location  between  the  top  tasks  hk-y  and  g^  in  A^.  This  is  beciause  Ty  contains 
hk-i,  Ty  has  a  location  after  hk-i  by  rule  R2.  So  Ty  €  Then  consider  Ty  €  Bjn k-  Ty 
is  either  a  top  task  or  a  nontop  task  in  Aj.  If  Ty  is  a  top  task  in  Aj,  Ty  must  be  one  of 
the  z  =  1, ...  ,771  by  the  definition  of  p,-.  If  Ty  is  a  nontop  task  in  Aj,  it  must  contain  at 
least  one  top  task,  which  is  a  top  task  among  Pi,  ■■■  ,gm,  or  hk+i  by  referring  to  Equation  7. 
Notice  that  Ty  must  not  contain  hk~i  since  Ty  €  Bjzi^k-  matter  whether  Ty  is  a  top  or 
nontop  task  in  Ay,  Ty  has  a  location  in  Oy  by  rule  R2.  If  Ty  €  then  Ty  has  a  location 

between  the  top  tasks  g^  and  hk+i  in  Ay  by  rule  R2.  So  Ty  €  fly. 


Now  we  prove  the  ’"'only  if’  part.  Let  Ty  be  a  task  located  in  fij.  If  Ty  is  one  of  the  top 
tasks  of  5i, . . .  ,gm,  Ty  contains  hk-  Either  Ty  G  Bk--i,k  or  Ty  €  Otherwise,  Ty  is  a  non¬ 

top  task  of  A;.  By  rule  R2,  Ty  contains  at  least  one  of  the  top  tasks  of  hk~i,gi,.  •  ■,9m, 

If  Ty  contains  one  oi  gi, . . .  ,gm,  then  Ty  also  contains  hk.  So  Ty  contains  either  hk-i,  or  hk, 
or  hk+i-  By  rule  R2,  Ty  should  fall  between  hk-i  and  hk,  and/or  between  hk  and  Ajt+i-  So 
j  U  Bk-i,k  U  U  B^j^  U  Bk,k+i  U  Bj.it+r  ° 

This  lemma  means  that  the  tasks  in  flj+i  —  hk  axe  exactly  the  same  tasks  which  axe  in 

rij. 

Lemma  8  If  Ty  €  H'-,  Ty  contains  hk- 

Proof:  We  would  like  to  show  that  if  Ty  does  not  contain  hk,  then  Ty  does  not  belong  to 
Ctj.  Since  p,-,  i  =  contains  hk,  that  Ty  does  not  contain  hk  means  that  Ty  does 

not  contain  gi  either.  Hence  Ty  can  not  have  a  location  in  the  subsequence  between  g-y  and 
Pto  in  ^j-  The  only  possible  locations  of  Ty  to  fall  in  Ctj  axe  either  between  hk-i  and  pi ,  or 
between  g^  and  hk+i-  If  Ty  falls  between  hk-y  and  gy,  that  Ty  does  not  contain  gy  impHes 
that  Ty  contains  hk-y  by  rule  R2.  That  is  Ty  €  A  nontop  task  cannot  have  duplicate 

positions  in  the  same  region  between  two  adjacent  top  tasks.  is  located  between  hk-y 

and  Pi  by  Equation  8.  Ty  does  not  have  a  location  in  the  head  of  O'-  before  gy .  For  the  same 
reason,  Ty  does  not  have  a  location  in  the  tail  of  fi'-  after  p^-  So  Ty  does  not  belong  to  fl'-. 
Therefore,  if  Ty  €  ftj,  Ty  contains  hk-  □ 

Lemma  9  If  Ty  €  n'-,Ty  €  Bk-y,k  U  Bjri,k  =  ^ 

PROOF:  If  Ty  €  Cl'j,  then  Ty  €  fly.  By  Lemma  7,  Ty  €  U  Bjzy^k  ^ 

U  Bj  We  know  that  Ty  contains  hk  by  Lemma  8,  so  Ty  €  Bk-i,k  U  U  B^  j^  U 

Bk,k+r  •  -A-lso  by  Lemma  6,  we  have  Ty  €  Bk-y,k  U  Bj^y^f.  =  Bj^j^  U  Bk,k+i  ■  ° 

Lemma  10  Assume  that  S  =  Lpre®T@Lpost  is  a  feasible  sequence,  where  L  =  (T^,,  t;„...,t. 
If  there  exists  a  sequence  L  =  {Ty, ,  Tyj , . . . ,  Ty,)  such  that  L  is  a  permutation  of  L  and  the 
tasks  of  L  are  ordered  by  the  weakly  leading  relation.  We  have  fLf,„eL&Lpc,t  —  fLprteLeLpct  ■ 


Proof:  We  bubble  sort  the  tasks  of  L  in  weakly  leading  order.  The  swapping  only  occurs 
between  two  adjacent  tasks.  For  each  swapping,  we  apply  the  Leading  Theorem  to  the  adja¬ 
cent  tasks,  which  correspond  to  and  respectively  in  the  theorem.  No  other  tasks  lie  in 
between  the  two  tasks  during  each  individual  swapping.  So  the  finish  time  of  the  resultant 

schedule  is  not  greater  than  that  of  the  original  schedule  according  to  the  Leading  Theorem. 

□ 

Lemma  11  A  task  Ty  6  fi'-  should  satisfy  one  of  the  conditions  Cl,  C2,  C3  or  C4. 

PROOF:  If  Ty  €  fl'-,  then  Ty  €  Bk-i,k  U  Bjzi,k  Lemma  9,  which  implies  that  TyUhk.  We 
have  T'y  <  and  dy  >  dk^..  There  are  four  possibilities. 

(i)  Sy  <  Th.^.  and  fy  >  dh^\  C4  is  satisfied. 

(ii)  Sy  >  rk^  and  fy  <  dk,^:  C3  is  satisfied. 

(iii)  Sy  <  rk^  and  fy  <  dk^ :  Cl  is  satisfied. 

(iv)  Sy  >  Tk^  and  fy  >  dk^:  C2  is  satisfied. 

□ 


3.5  Set-Scheduler  Algorithm 

By  Conformation  Theorem,  we  have  shown  that  there  exists  an  optimal  schedule  which 
conforms  to  the  super  sequence  A.  Hence,  we  can  use  Sequence-Scheduler  to  schedule  for 
each  instance  in  the  super  sequence,  and  pick  up  the  best  one.  Since  Sequence-Scheduler 
obtains  the  optimal  schedule  for  each  instance,  we  end  up  with  the  optimal  schedule  for 
the  task  set.  The  algorithm  for  scheduling  a  task  set  is  given  in  Figure  10.  The  Sequence- 
Scheduler  takes  O(n^)  time  for  each  instance,  while  there  axe 

9=1 

instances  to  check  in  the  super  sequence  as  illustrated  in  the  previous  section.  The  time 
complexity  of  Set-Scheduler  algorithm  is  thus  0{N  *  n^). 


Algorithm  Set-Scheduler: 


Input:  a  task  set  F  =  {ri,T2, 

Output:  the  optimal  schedule  p  for  F 
compute  the  super  sequence  A  for  F 
/^:=  0 

for  each  instance  I  in  the  super  sequence  A 

invoke  Sequence- Scheduler  to  compute  the  optimal  schedule  cr(n)  of  I 

if  (bl  <  k(’^)l)  or 

(IpI  =  |a(n)|  and  fp  >  /^(„)) 
p  :=  a{n) 
endif 
endfor 


Figure  10:  Set-Scheduler  Algorithm 


4  Evaluation 

Experiments  are  conducted  to  compare  the  performance  of  Set-Scheduler  with  those  of  the 
well-known  Eaxliest-Deadline-First  and  Least-Laxity-First  hexiristic  algorithms.  The  rela¬ 
tions  among  the  tasks  are  important  for  the  schedulability  of  the  tasks.  To  study  the  dif¬ 
ferences  between  different  cases,  we  allow  the  variation  of  the  computation  times,  and  the 
interarrival  times,  which  are  the  time  intervals  between  the  ready  times  of  two  consecutive 
tasks.  Tasks  in  a  task  set  are  generated  in  non-descending  order  by  their  ready  times.  The 
parameters  of  the  experiments  are  random  variables  with  trxmcated  normal  distribution,  as 
shown  in  Figme  11.  If  the  computation  time  of  a  task  is  greater  than  its  window  length,  the 
computation  time  is  truncated  to  its  window  length.  Such  a  tnmcation  is  not  applied  to  the 
interarrival  times. 

The  mean  of  Window  is  fixed.  Computation  time  ratio  is  the  ratio  of  the  computation 
time  to  the  window  length.  The  mean  of  Interarrival  time  ranges  from  10%  to  100%  of  the 
mean  of  Window.  The  standard  deviation  of  these  three  random  variables  are  set  to  be 


parameters 

mean 

Window  length 

10.0 

Computation  time  ratio 

0.25  0.5  0.75 

Interarrival  time 

1.0,  2.0,  ...,  10.0 

Figure  11:  Parameters  of  the  experiments 


20%,  50%,  and  80%  of  their  means.  For  simplicity,  the  ratios  of  the  three  random  variables 
are  set  to  be  the  same  for  each  individual  experiment.  For  each  experiment  with  different 
parameters,  100  task  sets,  each  with  12  tasks,  are  generated  for  scheduling. 

We  compare  the  performance  of  these  algorithms  by  (1)  Percentage  of  accepted  tasks: 
the  number  of  accepted  tasks  by  the  algorithm  over  the  number  of  the  tasks  of  the  optimal 
schedule  by  exhaustive  search;  (2)  Success  ratio:  the  number  of  times  that  the  algorithm 
comes  up  with  an  optimal  schedule  in  the  100  task  sets;  and  (3)  Comparisons  per  task  set: 
the  number  of  comparisons  per  task  set  that  each  algorithm  takes.  When  interarrival  times 
are  small,  more  containing  relations  among  tasks  are  likely  to  happen.  Figure  12  shows  that 
the  heuristic  algorithms  perform  worse  under  this  condition  and  tend  to  reject  more  tasks, 
especially  when  the  computation  time  ratio  is  larger.  Set-Scheduler  always  reaches  100% 
acceptance  rate  since  it  is  an  optimal  scheduler.  In  the  figure,  because  the  characteristics 
of  the  data  with  different  standard  deviation  ratios  are  similar,  only  the  data  with  standard 
deviation  ratio  equal  to  0.8  are  depicted.  When  success  ratio  is  concerned,  which  can  be  seen 
in  Figure  4,  the  heuristic  algorithms  performs  even  worse.  Generally  speaking,  the  heuristic 
algorithms  can  usually  produce  suboptimal  sched;iles,  but  fail  to  produce  the  optimal  ones 
most  of  the  time.  The  search  space  is  shown  in  Figure  14.  Set-Scheduler  performs  well  at 
the  expense  of  the  complexity,  which  may  become  very  large  when  the  interarrival  times  are 
small.  The  cost  is  more  reasonable  while  the  intei  arrival  times  between  tasks  are  not  too 
small. 


(a)  0>) 

paoeatxge  perotna^e 


Figure  12:  Percentage  of  accepted  tasks  (a)  EDF  (b)  LLF  compared  with  Set- Scheduler 
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Figure  14:  Number  of  comparisons  per  task  set 

5  Conclusion  Remarks  and  Future  Work 

In  tins  paper,  we  discuss  the  optimization  techniques  in  real-time  scheduling  for  aperi¬ 
odic  tasks  in  a  uniprocessor  system  with  the  non-preemptive  disciphne.  We  first  propose 
a  Sequence-Scheduler  algorithm  to  compute  the  optimal  schedule  for  a  sequence  in  0{n^) 
time.  Then  a  Set- Scheduler  algorithm  is  proposed  based  on  the  super  sequence  and  Sequence- 
Scheduler  algorithm.  The  complexity  of  our  Set-Scheduler  algorithm  is  0{N*n^),  compared 
to  0{N  *  n)  for  the  feasibihty  test  by  Erschler  ei  al,  where  N  might  be  as  large  as  expo¬ 
nential  m  the  worst  case.  However,  our  simulation  results  show  that  the  cost  is  reasonable 
for  the  average  case.  We  explore  the  temporal  properties  concerning  the  optimization  issues, 
and  present  several  theorems  to  formalize  the  results.  The  study  of  temporal  properties  on 
a  uniprocessor  may  serve  as  a  base  for  the  more  complex  cases  in  multiprocessor  systems. 

For  the  future  work,  we  propose  to  incorporate  the  decomposition  technique  [18]  into 
our  scheduling  algorithm.  Under  this  approach  a  task  set  can  be  decomposed  into  subsets, 
which  results  in  backtracking  points  to  reduce  the  search  space.  This  has  been  shown  to  be 


useful  in  reducing  the  search,  space  substantially  •when  the  task  set  is  well  decomposable. 
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ABSTRACT 

Let  us  consider  the  problem  of  scheduling  a  set  of  n  tasks  on  a  single  processor 
such  that  a  feasible  schedule  which  satisfies  the  time  constraints  of  each  task  is 
generated.  It  is  recognized  that  an  exhaustive  search  may  be  required  to  generate 
such  a  feasible  schedule  or  to  assure  that  there  does  not  exists  one.  In  that  case 
the  computational  complexity  of  the  search  is  of  the  order  n!. 

We  propose  to  generate  the  feasible  schedule  in  two  steps.  In  the  first  step 
we  decompose  the  set  of  tasks  into  m  subsets  by  analyzing  their  ready  times  and 
deadlines.  An  ordering  of  these  subsets  is  also  specified  such  that  in  a  feasible 
schedule  all  tasks  in  an  earlier  subset  in  the  ordering  appears  before  tasks  in  a  later 
subset.  With  no  simplification  of  scheduling  of  tasks  in  a  subset,  the  scheduling 
complexity  is  n,!),  where  n,-  is  the  number  of  tasks  in  the  zth  subset. 

The  improvement  of  this  approach  towards  reducing  the  scheduling  complexity 
depends  on  the  the  number  and  the  size  of  subsets  generated.  Experimental  results 
indicates  that  significant  improvement  can  be  expected  in  most  situations. 
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I  Introduction 


Consider  the  problem  of  nonpreemptive  scheduling  of  n  tasks  on  a  single  CPU  of  a  hard 
real-time  system.  For  task  T,-,  identified  as  i,  the  scheduling  request  consists  of  a  triple  <  c,-, 
r,-,  di  >  where  c,-  is  the  computation  time,  r,-  the  ready  time  before  which  ta^k  i  can  not 
start,  and  d,-  the  deadline  before  which  the  computation  must  be  completed.  Time  interval 
[r,-,  di]  is  called  the  time  window  denoted  by  lu,-.  The  window  length  |u;.|  is  d,-  -  r,-.  In  a 
hard  real-time  system,  a  schedule  is  called  feasible  if  all  tasks  are  processed  within  their 
individual  windows. 

The  result  of  the  scheduling  process  is  a  schedule  in  which  for  any  task  i,  a  start  time 
Si  and  a  finish  time  /.•  is  identified,  where  fi  =  s,-  -|-  c,-.  Clearly,  a  schedule  is  feasible,  if  for 
every  task  :, 


Ti  <Si<  di  -  c.-.  (1) 

The  scheduling  process  is  not  preemptive  only  if  for  any  two  tasks  i  and  j, 

-s,-  <  sj  s.-  +  ci<  sj.  (2) 

In  other  words,  when  task  i  is  scheduled,  a  span  of  nonpreemptable  processing  time, 
c,-,  is  allocated  for  it.  No  other  task  may  be  in  execution  during  that  time  span.  Thus  the 
scheduling  problem  is  to  find  a  mapping  from  a  task  set  {i}  to  a  start  time  set  {s,-},  such 
that  constraints  in  (1)  and  (2)  are  met.  Note  that  for  a  given  set  of  tasks  {i},  there  may 
be  none,  one  or  many  feasible  schedules. 

In  general  the  nonpreemptive  real-time  scheduling  problem  is  known  to  be  NP-complete 
[Gare79].  To  find  a  feasible  schedule,  the  number  of  schedules  to  be  examined  is  Ofn!). 
which  we  count  as  the  scheduling  complexity.  Heuristic  techniques  can  be  used  [Ma84. 
McMa75,  Mok83,  ZhaoS7]  to  reduce  the  complexity.  This  reduction,  however,  is  achieved 
at  the  cost  of  obtaining  a  potentially  sub-optimal  solution.  That  is,  when  looking  for  feasible 
schedules,  heuristic  techniques  may  not  yield  a  feasible  schedule,  even  though  one  exists. 
Schedules  based  on  the  earbest-deadline-first,  or  minimum-laxity-first  rules  are  examples  of 
such  heuristics  used  in  scheduling. 

An  alternate  approach  is  to  develop  analytical  methods  for  scheduling  [ErscSS,  Liu73]. 
This  approach  analyzes  the  relationships  among  real-time  tasks  and  schedules.  The  purpose 
is  to  precisely  determine  optimal  task  schedules,  or  narrow  the  search  scope  from  the  original 
search  space. 

The  objective  of  this  research  is  to  develop  correct  and  efficient  algorithms  for  nonpre¬ 
emptive  real-time  scheduling,  ^^'e  call  a  scheduling  algorithm  correct,  if  whenever  a  feasible 
schedule  exists,  the  algorithm  can  find  it. 


In  this  paper,  we  present  an  analytical  decomposition  approach  for  real-time  scheduling. 
The  strategy  is  to  divide  a  set  of  tasks  into  a  sequence  of  subsets,  such  that  the  search  for 
feasible  schedules  is  only  performed  within  each  subset.  The  decomposition  technique  used 
for  generating  the  sequence  of  subsets  assures  that  in  a  feasible  schedule  all  tasks  in  a  subset 
earlier  in  the  sequence  are  scheduled  before  any  task  in  a  later  subset.  Backtracking  in  the 
search  is  bounded  within  each  subset,  which  significantly  reduces  the  scheduling  complexity. 

There  are  several  different  strategies  which  can  be  used  to  subset  tasks.  The  decompo¬ 
sition  strategy  discussed  in  this  paper  is  to  use  a  relation  called  the  leading  relation  which 
depends  on  the  talks’  relative  window  positions. 

We  performed  an  experiment  which  examined  the  number  and  size  of  subsets  with 
regard  to  the  number  of  tasks,  task  arrival  rate,  and  window  length.  We  found  that,  in 
general,  the  number  of  tasks  in  any  subset  is  independent  of  the  total  number  of  tasks  to 
be  scheduled,  if  the  task  window  lengths  are  bounded.  The  decomposition  scheduling  is  a 
polynomial  computation.  As  a  consequence,  the  decomposition  method  is  very  practical  for 
the  implementation. 

In  section  II  we  present  some  basic  notions  used  in  the  paper.  In  section  III  we  discuss 
a  case  where  all  the  tasks  have  the  leading  relation  with  each  other.  Our  approach  of 
decomposition  scheduling  is  introduced  in  section  IV  along  with  concepts  of  the  single 
schedule  subset  and  decomposed  leading  schedule  sequence.  We  present  our  experiment 
results  in  section  V.  Our  conclusion  and  future  research  in  section  VII. 

II  Background 

If  we  consider  any  two  tasks  i  and  j*,  they  must  have  one  of  these  three  relations: 

1.  leading  -  i  <  j  (or  j  <  z),  where  if  r,-  <  di  <  dj  and  Wi  ^  Wj. 

2.  matching  -  z||j,  if  r,-  =  vj  and  di  =  dj. 

3.  containing  -  iuj  (or  j  U  z),.if  r,-  <  vj  and  dj  <  d,-. 

These  three  relations  are  shown  in  Fig.  1.  It  is  easy  to  see  that  the  leading,  matching 
and  containing  relations  are  all  transitive.  Additionally,  if  i\\j  or  i  U  j,  we  say  that  i  and  j 
are  concurTcnt. 

A  length  is  associated  with  a  schedule  which  is  the  finish  time  of  the  last  task  in  the 
schedule.  One  example  is  shown  in  Fig.  2. 

The  concept  of  dominance  was  introduced  in  [ErscS3],  and  we  will  use  it  later  in  the 
discussion. 

Definition  1  For  two  schedules  Si  and  52,  Sj  dominates  S2  if  and  only  if: 

52  feasible  5i  feasible. 
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Figure  1:  Task  window  relations 

Definition  2  A  set  of  schedules,  5,  is  dominant  if  VSj  ^5,  35i  €5  such  that  5i  dominates 
52. 

A  schedule  is  dominant  if  it  dominates  all  other  schedules. 

Ill  The  Leading  Schedule  Sequence 

Let  us  consider  the  case  where  for  a  set  of  task  {i},  every  pair  of  tasks  in  this  set  has  a 
leading  relation,  i.e.  i  ■<  j  or  j  -<  i,  for  every  i,  y ,  i  j. 

Based  on  the  leading  relation  we  can  define  a  total  order  of  tasks  for  the  set.  We  define 
the  leading  schedule  sequence  [LSS)  to  be  a  sequence  of  tasks  in  which  tasks  are  in  order 
according  to  the  leading  relation,  that  is,  for  any  i  and  ;,  i  A  j  i  ^  j,  where  i  j 
means  that  i  is  scheduled  in  front  of  j. 

Theorem  1  For  a  set  of  tasks  all  of  which  have  a  pairwise  leading  relation,  the  schedule 
where  tasks  are  sequenced  in  order  of  the  leading  schedule  sequence  is  a  dominant  one. 

Proof:  We  prove  this  theorem  by  construction. 
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Figure  2:  An  example  of  one  schedule 

Suppose  this  set  of  tasks  has  a  feasible  schedule  S  in  which  tasks  occur  in  a  different 
.sequence  than  the  leading  schedule  sequence.  When  we  examine  this  schedule,  let  i  and  j 
be  the  first  pair  of  tasks  that  are  not  ordered  by  the  leading  relation,  i.e.  i  -<  but  j  i. 
From  the  leading  relation  we  know  that  r,-  <  Vj  and  d,  <  dj. 

Since  j  and  i  are  the  first  such  pair,  deadlines  of  all  tasks  between  j  and  i  in  S'  must 
be  greater  than  or  equal  to  dj  as  well  as  d,-.  In  S,  let  us  construct  another  schedule  S'  by 
moving  i  from  the  current  position  in  S  to  the  position  just  in  front  of  j.  The  start  time 
and  finish  time  of  tasks  between  j  and  i  including  j  will  be  increased  bv  no  more  than  c;. 
And  so,  no  task  between  j  and  i  including  j  will  finish  later  than  d,-.  Meanwhile,  the  rest  of 
this  schedule  is  unchanged.  Thus  if  S  is  feasible,  the  new  schedule  S'  wiD  be  feasible  too. 

By  repeating  the  process  of  constructing  S'  from  5,  we  obtain  a  schedule  which  has  all 
tasks  ordered  according  to  the  leading  relation,  such  that  if  the  original  schedule  is  feasible, 
so  is  the  constructed  one.  □ 

Thus  if  there  exists  a  set  of  feasible  schedules,  the  set  must  contain  schedules  that  are 
conforming  to  the  order  of  the  leading  schedule  sequence.  The  result  can  be  generalized 
to  the  situation  where  there  exist  matching  windows.  The  generalization  is  to  combine 


the  tasks  with  the  same  window  into  one  task  whose  computation  time  is  the  sum  of  the 
computation  times  of  all  these  tasks. 

We  will  see  that  the  above  leading  schedule  sequence  is  a  special  case  of  the  decomposed 
leading  schedule  sequence  introduced  in  the  next  section. 

IV  Task  Decomposition 

A.  Philosophy 

To  solve  the  general  real-time  scheduling  problem  with  n  tasks,  the  number  of  schedules  to 
be  examined  can  be  as  much  as  0(n!).  However,  taking  a  closer  look,  we  find  that  every 
task  has  an  important  property  called  the  locality  of  a  task,  that  is,  a  task  is  time-bounded 
by  its  time  window.  Furthermore,  if  any  two  task  windows  are  not  overlapping,  there  is 
only  one  possible  order  for  them.  The  above  facts  motivate  us  to  separate  the  tasks  into 
subsets  according  to  their  different  time  localities. 

The  decomposition  scheduling  can  be  divided  into  two  steps:  decomposition  and  schedul¬ 
ing. 

First,  a  set  of  n  tasks  is  decomposed  into  a  sequence  of  m  subsets  such  that  the  orders  of 
subsets  are  fixed.  The  order  of  a  task  is  determined  only  relative  to  the  other  tasks  within 
its  own  subset.  The  sequence  of  the  subsets  is  called  the  decomposed  schedule  seguence. 
The  decomposition  should  be  so  developed  that  the  schedulability  of  tasks  is  not  damaged 
at  all.  The  decomposition  by  using  the  leading  relation  introduced  in  this  paper  shows  this 
property. 

The  second  step  is  to  schedule  the  subsets  in  the  sequence  order.  It  alwavs  selects 
a  schedule  for  each  subset  with  the  shortest  length,  so  that  when  a  subset  is  scheduled, 
the  time  span  available  for  it  is  maximized.  In  this  way,  the  total  number  of  schedules 
to  be  examined  is  only  n,!),  where  n,-  is  the  number  of  tasks  in  the  ith  subset 

(DS:  —  ^)- 

The  only  remaining  problem  is  how  to  decompose  a  set  of  tasks  into  a  sequence  of 
subsets  of  tasks  such  that  a  feasible  schedule  is  guaranteed  to  be  found  if  one  exists.  In  the 
'  rest  of  this  paper,  we  outline  how  to  use  the  leading  relation  as  a  means  to  divide  the  task 
set. 

B.  Decomposition  Scheme 

A  set  of  tasks  is  called  the  single  schedule  subset  (sss),  represented  as  r,  if 


Vi  £t  3j  er  {iuj)  V  (j  U  i)  V  (il|;). 


In  other  words,  each  task  window  is  contained  in  the  window  of  another  taisk,  contains 
the  window  of  another  task,  or  matches  the  window  of  another  task  in  the  subset. 

Given  a  set  of  tasks  {:},  we  can  decompose  it  into  a  sequence  of  single  schedule  subsets 
•  •  •,  such  that  all  the  tasks  in  r’  are  leading  to  all  the  tasks  in 
The  decomposed  leading  schedule  sequence  (DLSS)  is  defined  to  be  a  sequence  of  siirgle 
schedule  subsets,  denoted  as; 

DLSS  =  0  o  •  •  ■  0  r”*, 

such  that  V/:'  £  r'  V/c-’  G  k'  -<  for  I  <  i  <  j  <  m,  (denoted  as  r*  -(  r-’),  and  r‘  can  not 
be  further  decomposed,  for  i  =  1,  -  ■  ■  ,m.  Symbol  o  represents  a  concatenating  operation. 

Note  that  if  a  task  in  r’  does  not  lead  another  task  in  for  i  <  j\  they  must  have 
a  matching  or  containing  relation.  If  this  happens,  t'  and  can  not  be  different  single 
schedule  subsets.  Clearly,  all  n  tSLsks  may  belong  to  a  single  schedule  subset. 

Theorem  2  The  set  of  schedules  conforming  to  the  decomposed  leading  schedule  sequence 
is  dominant. 

Proof:  Assume  that  if  there  are  two  tasks  k*  6  r’  and  E  ,  where  r’  -<  rA  There  is  no 
common  concurrent  task  with  both  k'  and  k^ .  k^  is  positioned  in  front  of  /:'  in  a  feasible 
schedule  (S).  Specifically,  5  =  o  (k^  o  ■  ■  ■  k^)  o  ■  •).  Let  us  define  S'  =  o  ■  ■  -  k')  for 
abbreviation  (S  =  (•  •  •)  o  5'  o  (•  •  •)).  The  new  schedule  created  by  exchanging  k'^s  position 
with  y'’s  is  still  feasible. 

Without  loss  of  generality,  suppose  that  k'  and  k^  are  the  first  such  pair  in  S.  Tasks 
between  k^  and  k'  are  led  by  k^,  or  concurrent  with  k^ .  but  not  leading  to  and  not  concurrent 
with  Since  k'  ■<  k^  (i.e.  r,-  <  Vj),  switching  k*'s  and  fc-^’s  positions  wiU  not  increase  the 
finish  time  of  S',  which  is  defined  as  the  finish  time  of  the  last  task  in  S'.  All  the  tasks 
between  k'  and  k^,  including  k^ ,  are  led  by  /:*,  i.e.  having  deadlines  greater  than  or  equal 
to  df-i.  If  S  is  feasible  with  k*  as  the  last  task  in  S',  it  will  be  still  feasible  after  the 
switching.  0 

Note  that  if  the  set  of  schedules  that  are  following  the  decomposed  leading  schedule 
sequence  is  empty,  there  is  no  feasible  schedule  available  for  the  tasks  to  be  scheduled. 

C.  Decomposition  Algorithm 

Decomposing  a  set  of  tasks  into  single  schedule  subsets,  the  algorithm  starts  with  the  tasks 
having  been  sorted  by  their  ready  times  (using  their  deadlines  if  their  ready  times  are  the 
same). 


The  algorithm  uses  one  single  loop  to  determine  which  single  schedule  subset  the  cur¬ 
rent  task  should  belong  to.  The  loop  consists  of  two  pans.  The  first  part  is  a  while  loop 
which  merges  single  schedule  subsets  into  one,  if  the  current  task  is  contained  by  them.  The 
second  part  decides  whether  the  current  task  can  form  a  new  single  schedule  subset,  or  join 
with  another  single  schedule  subset. 

The  Leading-relation  Decomposition  Algorithm 
begin 

/*  Initialization.  */ 

fe  =  l;r>  =  {l}; 

=  di\ 

for  i  =  2  to  71  do  /*  Go  over  the  task  list.  */ 

^  —  h  ~  1',  /*  I  is  the  index  of  single  schedule  subsets.  * / 

continue  =  TRUE; 

while  (/  >  0)  A  (continue)  do 

/*  Merge  single  schedule  subsets  if  the  current  task  is  concurrent 
with  tasks  in  different  subsets.  * / 
if  (d^i  >  di)' 

U  r^; 

d^i  =  d^k ; 

k  =  l-, 
else 

continue  =  FALSE; 

I  =  I  -T; 
od 

If  (’'t^  ^  ^i)  A  (d{  <  d^k) 

/  The  current  task  is  concurrent  with  tasks  in  the  current  subset  * / 

r'=  =  r*U{i}; 

else  if  (r^k  <  r,-)  A  (d_*  <  d,-) 

/  The  current  task  is  led  by  all  the  tasks  in  the  current  subset. 

A  new  single  schedule  subset  is  created  only  containing  the  current  task.*/ 

k  =  k  +  1; 

=  {i}; 

T^k  =  r.-; 
d^k  =  di- 
od 


end 


In  this  Jilgorithm,  the  outer  loop  is  executed  ii  times.  The  while  loop  is  executed  no 
more  than  the  number  of  time  proportional  to  n  in  total,  since  no  more  than  n  subsets  can 
be  merged  during  the  whole  execution  of  the  algorithm  with  n  tasks.  Thus  the  complexity 
of  this  algorithm  is  only  0(n).  If  we  count  in  the  sorting  complexity,  the  decomposition 
will  cost  no  more  than  0(n log n). 


D.  Scheduling  Scheme 


After  tasks  has  been  decomposed  into  a  sequence  of  subsets,  scheduling  should  be  performed 
on  each  subset  in  the  sequence  order,  such  that  the  schedule  on  each  subset  is  of  the 


shortest  length.  A  brute  force  method  is  to  give  an  exhaustive  search  whose  computational 
complexity  amounts  to  0(n{!),  where  n,-  is  the  number  of  taisks  in  the  ith  subset. 

In  [Yuan89b],  other  scheme  is  explored  for  scheduling  a  subset.  The  method  is  to  first 


build  a  super-sequence  where  tasks  may  have  several  occurrences.  The  occurrence  of  a  task 
is  decided  by  its  relative  window  position  in  the  subset.  Selecting  one  occurrence  for  every 

task  in  the  super-sequence  forms  a  schedule.  A  complete  search  costs  0(nf  j  in  the 


i(n,-n^) 


with  n,-  less  than  100, 


worst  case.  When  we  made  a  few  calculation  samples  of  n? 

n?'  '  •  Ms  a  much  smaller  number  than  n,-!,  as  shown  in  the  cited  paper. 

Since  the  set  of  schedules  following  the  decomposed  leading  schedule  sequence  is  domi¬ 
nant,  and  since  the  subsets  are  scheduled  in  the  sequence  order  with  their  shortest  length,  it 
is  proved  that  the  decomposition  scheduling  with  the  leading  relation  is  correct  [YuanSOb]. 


V  Empirical  Study 

A.  Experiment 

In  order  to  observe  the  behavior  of  the  number  of  tasks  in  a  single  schedule  subset  and 
‘number  of  the  subsets  to  be  created  with  regard  to  the  number  of  tasks  to  be  scheduled, 
task  arrival  rate,  and  task  window  length,  we  conduct  an  experiment  as  an  example  to  see 
the  feasibility  of  our  approach  for  practical  implementation. 

The  outputs  we  are  interested  in  are: 

1.  the  number  of  single  schedule  subsets  (sss), 

2.  the  number  of  window  concurrences, 

3.  the  maximum  number  of  tasks  in  single  schedule  subsets, 

4.  the  minimum  number  of  tasks  in  single  schedule  subsets,  and 


5.  the  average  number  of  tasks  in  single  schedule  subsets. 

One  window  concurrence  is  counted  for  any  two  tasks  i  and  j  if  i  and  j  have  a  concurrent 
relation,  ^^'e  call  the  number  of  tasks  in  a  single  schedule  subset  as  the  size  of  the  subset. 

Meanwhile,  we  change  the  following  parameters  independently  to  watch  the  changes  in 
the  outputs, 

1.  the  number  of  total  tasks, 

2.  task  arrival  rate,  and 

3.  window  length. 

The  data  is  shown  in  Table  1-4^  in  the  end  of  this  paper.  Following  are  basic  rules  in 
the  experiment. 

1.  The  computation  time  is  uniformly  distributed  over  (0,o]. 

2.  The  task  interarrival  is  uniformly  distributed  over  [0,/3).  The  arrival  rate  is  2//3. 

3.  The  window  length  is  also  randomly  created  by  controlling  the  laxity  for  each  task. 
The  laxity  of  a  task  is  the  difference  between  its  window  length  and  its  computation 
time.  The  laxity  is  uniformly  distributed  [0,  7).  The  distribution  guarantees  the 
window  length  greater  than  the  computation  time  for  the  task. 

We  notice  that  the  arrival  rate  should  be  less  than  or  equal  to  the  service  rate,  otherwise, 
there  are  congestions  in  the  system,  which  will  result  in  deadline-missing.  In  other  words, 
2/0  <  2/a.  That  is, 

a  <0. 

The  random  numbers  are  provided  by  function  drandQ  in  the  UNIX  operating  system. 
The  numbers  are  uniformly  distributed  over  [0,  1)  [Stev86]. 

In  the  experiment,  we  found  that  the  minimum  size  of  single  schedule  subsets  is  always 
one. 

B.  The  Result  Explanation  and  Observation 

From  the  experiment  results  ,  we  found  that  when  the  average  window  length  increases 
(7  increases),  the  number  of  single  schedule  subsets  reduces  and  the  maximum  size  of 
single  schedule  subsets  slightly  increases.  The  result  is  expected,  since  the  larger  some  task 

Un  the  tables,  number  is  represented  by  num.  Window  by  W.  Concurrences  by  concurr.  Average  by 
a-vg.  The  Single  schedule  subset  by  sss. 


windows  are,  the  more  tasks  may  be  concurrent  with  them.  Tliese  rasks  may  be  in  the  same 
single  schedule  subset. 

When  P  increcLses,  that  is,  the  arrival  rate  decreases,  the  number  of  single  schedule 
subsets  increases,  and  maximum  size  of  single  schedule  subsets  decreases.  The  result  is  also 
expected,  since  when  the  arrival  rate  decreases,  the  opportunity  of  tasks  concurrent  with 
each  other  decreases  too.  Most  tasks  have  the  leading  relation  with  each  other. 


Figure  3:  The  relationship  between  the  size  of  single  schedule  subsets  and  the  number  of 
tasks  with  regard  to  the  laxity  pairameter  7,  where  a  =  4,  /?  =  4. 

Fig.  3  shows  the  relationship  between  the  maximum  size  of  single  schedule  subsets  and 
the  number  of  tasks  to  be  scheduled.  From  the  experiment,  we  found  that  the  size  of  a 
single  schedule  set  never  exceeds  14  even  when  there  are  300  tasks  being  scheduled.  The 
observation  indicates  that  for  most  cases  mGz^j(n{)  is  a  constant. 

We  show  the  relationship  between  the  number  of  subsets  (m)  and  number  of  tasks  to 
be  scheduled  in  Fig.  4  and  Fig.  5  with  regard  to  different  window  length  and  arrival  rate 
distributions. 
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Figure  4.  The  relationship  between  the  number  of  single  schedule  subsets  and  the  number 
of  tasks  with  regard  to  the  laxit_y  parameter  7,  where  a  =  4,  /?  =  4. 

VI  Final  Remarks 

In  tbs  paper,  we  examine  the  problem  of  nonpreemptive  scheduling  of  n  tasks  on  a  single 
CPU  in  hard  real-time  systems.  We  propose  a  correct  decomposition  strategy  for  the 
scheduling.  The  strategy  significantly  reduces  the  scheduling  complexity  for  most  cases. 

In  tbs  paper  we  have  examined  a  decomposition  techbque  based  oby  on  the  windows 
of  tasks.  By  tabng  into  account  the  computation  time  requirements,  the  decomposition 
can  be  made  stronger  [Yuan89a].  The  decomposition  approach  may  also  be  extended  to 
consider  precedence  and  other  dependences  among  tasks.  This  aspect  of  decomposition 
technique  needs  further  study. 
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Table  1:  a  =  4,  /S  =  4 


num  of  j  7  avg  W.  num  of  num  of 
tasks  I  I  length  i  concurr.  5.s'5 


5 


IS 


27 


2 

3.09 

32 

118 

4 

1.27 

4 

3.78 

40 

114 

4 

1.32 

6 

4.93 

68 

92 

6 

1.63 

8 

5.85 

91 

73 

7 

2.05 

10 

6.56 

111 

68 

10 

2.21 

2 

3.11 

41 

159 

4 

1.26 

4 

3.67 

64 

144 

4 

1.39 

6 

4.94 

98 

116 

6 

1.72 

8 

5.94 

102 

113 

8 

1.77 

10 

7.27 

161 

81 

8 

2-47 

2 

3.02 

46 

204 

4 

1.23 

4 

3.82 

93 

167 

5 

1.50 

6 

4.83 

107 

160 

6 

1.56 

8 

6.33 

162 

121 

8 

2.07 

10 

7.29 

186 

103 

1 

2.43 

4 

1.27 

0  , 

1.46 

6  I 

1.55 

11  1 

1.96 

11  1 

2.40 

Table  2:  a  =  4,  ^  =  6 

num  of 

lasks 

7 

avg  W. 
length 

num  of 

concur! . 

num  of 

sss 

555  size 

max 

avg 

50 

2 

IBI9I 

38 

3 

1.32 

4 

4.22 

7 

43 

3 

1.16 

6 

4.45 

16 

36 

3 

8 

6.21 

21 

31 

4 

10 

6.82 

23 

30 

100 

2 

2.96 

16 

85 

3 

1.18 

4 

3.93 

16 

85 

3 

1.18 

6 

3.93 

16 

85 

3 

1.18 

8 

6.32 

31 

73 

4 

1.37 

10 

7.21 

52 

58 

7 

1.72 

150 

2 

3,29 

20 

130 

4 

1.15 

4 

3.98 

29 

125 

5 

1.20 

6 

5.33 

42 

111 

4 

1.35 

8 

5.90 

55 

100 

5 

1.50 

10 

6.96 

77 

83 

8 

1.81 

200 

2 

3.04 

25 

175 

4 

1.14 

4 

3.90 

48 

153 

3 

1.31 

6 

5.18 

55 

148 

9 

1.35 

8 

5.95 

87 

128 

8 

1.56 

7.18 

84 

128 

6 

1.56 

2 

2.93 

45 

206 

3 

1.21 

4 

4.08 

55 

200 

6 

1.25 

250 

6  j  5.10 

55 

197 

5 

1.27 

8  !  6.15 

78 

175 

8 

1.43 

10 

6.78 

98 

162 

7 

1.54 

num  of  7 
ta5ks 


Table  4: 


avg  W. 


0  =  4,  /?=  10 


num  of 


2 

3.44 

4 

4.11 

6 

5.50 

6.69 

10 

6.61 

555 

size 

max 

avg 

89 

4 

1.12 

87 

4 

1.15 

80 

4 

1.25 

82 

3 

1.22 

76 

T“ - 1 

5 

1.32 

182  1 

3 

1.10 

174 

5 

1.15 

177 

3 

1.13 

167 

4 

1.20 

153 

4 

1.31 

280 

3 

1.07 

251 

6 

1.20 

254 

3 

!  1.18 

243 

8 

1.23 
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Static  time  driven  scheduling  has  been  advocated  for  use  in 
Hard  Real-Time  systems  and  is  particularly  appropriate  for 
many  embedded  systems.  The  approaches  taken  for  static 
scheduling  often  use  searcli  techniques  and  may  reduce  the 
search  by  using  heuristics.  In  this  paper  we  present  a  tech¬ 
nique  for  analyzing  the  temporal  relations  among  the  tasks, 
based  on  non-preemptive  schedulability.  The  relationships 
can  be  used  effectively  to  reduce  the  average  complexity  of 
scheduling  these  tasks.  They  also  serve  as  a  basis  for  selective 
preemption  policies  for  scheduling  by  providing  an  early  test 
for  infeasibility.  We  present  examples  and  simulation  results 
to  confirm  the  usefulness  of  temporal  analysis  as  a  phase  prior 
to  scheduling. 

1  Introduction 

Many  safety  critical  real-time  applications  like  process  con¬ 
trol,  embedded  tactical  systems  for  military  applications,  air- 
traffic  control,  robotics  etc.  have  stringent  timing  constraints 
imposed  on  their  computations  due  to  the  characteristics  of 
the  physical  system.  A  failure  to  observe  the  timing  con¬ 
straints  can  result  in  intolerable  system  degradation  and  in 
some  cases  it  may  have  catastrophic  consequences. 

Sclieduling  is  the  primary  means  of  ensuring  the  satisfaction 
of  timing  constraints  for  such  systems[l].  As  a  result,  signif¬ 
icant  effort  has  been  invested  in  research  on  hard  real  time 
scheduling  [2,  3,  4].  In  this  paper  we  discuss  a  scheduling 
technique  for  static  scheduling  to  guarantee  timely  execution 
of  time  critical  tasks. 

The  time  driven  scheduling  model  is  being  used  by  many 
experimental  systems,  including  MARS [5],  MARUTI[6]  and 
Spring  Kernel  [7].  The  static  time  driven  scheduling  technique 
involves  constructing  a  schedule  offline,  which  may  be  repre¬ 
sented  as  a  Gantt  chart[8]  or  calendar[6]  (Figure  1).  Tasks 
are  invoked  at  run-time  whenever  they  are  scheduled  to  exe¬ 
cute.  Such  a  scheduling  model  is  particularly  appropriate  for 
many  embedded  systems.  Recent  effort  in  this  direction  has 
shown  the  viability  of  such  an  approach  for  practical  real-time 
applications[9]. 

*This  research  was  supported  in  part  by  ONR  and  DARPA  under 
contract  N00014-91-C-0195. 


Figure  1:  Gantt  Chart  or  Calendar 

The  intractability  of  most  scheduling  problems  has  led  to 
approaches  based  on  search  tecliniques  for  scheduling  of  real¬ 
time  tasks.  The  feasibility  of  a  task  set  is  determined  through 
construction  of  a  schedule;  failure  to  construct  a  schedule 
denotes  infeasibility.  Heuristics  are  often  used  as  a  means 
of  controlling  the  complexity  of  scheduling.  In  many  cases, 
heuristics  perform  well  enough  to  result  in  an  acceptable  so¬ 
lution. 

There  has  been  little  emphasis  on  the  use  of  analytic  tecli- 
niques  to  assist  in  time  driven  scheduling.  Decomposition 
sclieduling[10]  based  on  dominance  properties  of  sequences[ll] 
uses  analytic  techniques  to  decompose  a  set  of  tasks  into  a  se¬ 
quence  of  subsets.  Significant  reduction  in  average  complexity 
can  be  achieved  if  the  set  of  tasks  can  be  decomposed  into  a 
large  number  of  subsets,  each  having  a  small  number  of  tasks. 

In  this  paper,  we  present  an  analysis  technique  for  time 
driven  scheduling  based  on  the  timing  requirements  of  tasks. 
The  analysis  results  in  the  establishment  of  a  set  of  temporal 
relations  between  pairs  of  tasks  based  on  a  non-preemptive 
scheduling  model.  These  relations  can  be  used  by  scheduling 
algorithms  to  reduce  the  complexity  of  scheduling  in  the  av¬ 
erage  case,  and  as  an  early  test  for  infeasibility.  As  a  test  for 
infeasibility,  it  provides  a  good  basis  for  policies  using  selec¬ 
tive  preemption  to  enhance  feasibility.  When  infeasibility  is 
not  detected,  the  temporal  relations  may  be  used  by  a  search 
algorithm  to  effectively  prune  large  portions  of  search  space, 
thereby '  controlling  the  cost  of  scheduling. 

2  Time  Driven  Scheduling 

The  time  driven  scheduling  approacli  constructs  a  calendar 
for  the  set  of  tasks  in  the  system.  The  tasks  may  be  sched¬ 
uled  preemptively  or  non-preemptively.  The  non-preemptive 
scheduling  problem  for  a  uniprocessor  is  known  to  be  NP- 
Complete[12].  When  the  tasks  are  mutually  independent,  and 
can  be  preempted  at  any  time,  it  is  known  that  the  earliest 
deadline  first  policy  is  optimal [13]  and  obviates  the  need  for 
non-preemptive  scheduling.  However,  when  tasks  synchronize 


using  critical  sections,  the  preemptive  scheduling  problem  is 
also  known  to  be  intractable(NP-Hard)[14]. 

In  general,  when  the  overhead  of  preemption  is  negligible, 
the  non-preemptive  solutions  form  a  subset  of  preemptive 
solutions[8].  However,  when  tasks  may  interact  with  each 
other,  the  non-preemptive  models  are  simpler,  easier  to  im¬ 
plement  and  closer  to  reality [15].  They  are  also  necessary  for 
certain  scheduling  domains  like  I/O  scheduling  and  provide  a 
basis  for  selective  preemption  policies. 

2.1  Task  Model 

We  consider  a  set  of  n  tasks  F  =  {r,  :  i  —  l,2,...,n}  to 
be  scheduled  for  execution  on  a  single  processor.  Each  task 
Ti,  abbreviated  as  i,  is  a  3- tuple  [ri,Ci,  d,-]  denoting  the  ready 
time,  computation  time  and  deadline  respectively.  The  time 
interval  is  called  the  timing  window  Wi  of  task  Ti,  and 

indicates  the  time  interval  during  which  the  task  can  execute. 
The  computation  time  of  each  task  is  less  than  the  window 
length  All  tasks  are  assumed  to  be  independent  for 

simplicity  of  exposition  even  though  such  a  requirement  is 
not  necessary  for  the  analysis. 

In  a  hard  real-time  system,  processes  may  be  periodic  or 
sporadic  [14].  Such  a  set  of  processes  may  be  mapped  to  our 
scheduling  model  by  techniques  identified  in  [1,  14,  16]  and 
constructing  a  schedule  for  the  least  common  multiple  of  the 
periods  of  the  tasks. 

2.2  Non-Preemptive  Scheduling  Model 

A  non-preemptive  schedule  is  the  mapping  of  each  task  r,-  in  F 
to  a  start  time  s,* .  The  task  is  then  scheduled  to  run  without 
preemption  in  the  time  interval  [sj,/,-],  wdth  its  finish  time 
being  fi  =  s,  -he,*.  A  feasible  schedule  is  a  schedule  in  which 
the  following  conditions  are  satisfied  for  each  task  : 

ri  <  (1) 

fi  <  di  (2) 

It  is  useful  to  consider  a  non-preemptive  schedule  as  an 
ordered  sequence  of  the  set  of  tasks.  To  get  a  maximally 
packed  schedule  from  a  sequence  [ti,  r2, . . . ,  r„],  we  can  re¬ 
cursively  derive  the  start  time  s,-  and  finish  time  /,*  of  the 
tasks  as  follows: 

Si  =  max(ri,/i_a)  (3) 

fi  =  Si  +  a  (4) 

with 

Si  =  ri 

The  scheduling  problem  can  thus  be  considered  as  a  search 
over  the  permutation  space.  A  permutation  (sequence)  is  fea¬ 
sible  if  the  corresponding  schedule  is  feasible.  Notice  that 
for  any  permutation  schedule  derived  as  above,  equation  1  is 
implied  by  (3)  and  we  only  need  to  verify  the  deadline  con¬ 
straints  for  the  tasks. 

^  I  =  ii  -  Ti 
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Figure  2:  Infeasibility  of  task  A  executing  before  task  B 

3  Temporal  Analysis 

Temporal  analysis  uses  pairwise  sell edu lability  analysis  of 
tasks  to  generate  a  set  of  relations  to  eliminate  sequences 
which  cannot  lead  to  feasible  solutions.  In  this  section  we  de¬ 
fine  the  temporal  relations  and  show  how  they  may  be  derived 
from  the  timing  constraints  of  texsks. 

3.1  Definitions  of  Temporal  Relations 

Consider  two  tasks  and  Tj.  We  wish  to  find  out  what  we 
can  say  about  the  relative  ordering  of  these  tasks,  given  their 
timing  constraints.  A  set  of  relations  are  identified  below 
which  identify  the  different  possibilities. 

Precedence  Relation:  A  precedence  relation  denoted  as 
Ti  — Tj,  implies  that  in  any  feasible  schedule  must 
execute  before  tj  . 

Infeasible  Relation:  An  infeasible  relation  denoted  by 
Ti  0  Tj  implies  that  in  any  feasible  schedule,  rj  and  Tj 
cannot  run  in  a  sequential  order. 

Concurrent  Relation:  Ti  ||  tj  if  there  is  no  precedence  or 
infeasible  relation  between  them.  A  concurrent  relation 
indicates  that  a  feasible  schedule  may  exist  with  any  or¬ 
der  of  the  tasks  Tj  and  Tj.  It  does  not,  however,  indicate 
the  existence  of  a  feasible  schedule. 

For  each  task  Ti  let  us  define  two  terms  and  U,  denoting 
the  earliest  finish  time  and  the  latest  start  time  as: 


ei  =  Ti  +  a  (5) 

li  =  dj  —  Ci  (6) 

A  preliminary  set  of  relations  can  be  established  using  the 
following  rules,  for  every  pair  of  tasks  Ti  and  Tj . 

(ei  <  Ij)  A  {h  <  ej)  =>  Ti  — ^  Tj  (7) 

(e,-  >  I j)  A  (/,  >  ey)  =>  Tj  — ^  n  (8) 

(^x  ^  Ij)  A  {li  >  ey)  =>  Ti  II  Tj  (9) 

{Ci  ^  Ij)  A  (/{  <  Cy )  Ti  0  Tj  (1^) 


The  basic  idea  is  that  if  the  earliest  finish  time  of  a  task  A  is 
greater  than  the  latest  start  time  of  a  task  B,  then  a  feasible 
schedule  cannot  be  found  in  whiclr  A  is  sclieduled  before  B 
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Figure  3:  Window  Modification  (A  — ►  B) 
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Figure  5:  Precedence  Graph  for  example  of  Figure  4 

(Figure  2).  Thus,  for  instance  the  first  part  of  condition  for 
rule  10  says  that  r,-  cannot  precede  rj,  and  the  second  part 
says  that  7j  cannot  precede  Ti,  establishing  the  infeasible  re¬ 
lation. 

3.2  Window  Modification 

Consider  two  tasks  Ta  and  and  a  precedence  relation 
Ta  — ►  n  between  them.  As  this  indicates  that  in  any  fea¬ 
sible  schedule  must  precede  we  can  update  the  timing 
windows,  as  follows  (Figure  3): 

=  min{daJb)  (11) 

r[  =  max{rb,ea)  (12) 

The  window  modification  does  not  alter  the  scheduling 
problem  in  the  sense  that  every  feasible  sequence  with  the 
original  timing  constraints  is  a  feasible  sequence  with  the 
modified  timing  constraints  and  vice-versa.  Further,  the 
schedules  for  feasible  sequences  are  identical  in  both  cases. 
A  task's  window  may  shrink  because  of  window  modification. 
This  may  lead  to  a  change  in  the  relation  of  the  modified  task 
with  other  tasks.  The  procedure  may  be  applied  iteratively 
till  no  further  changes  can  be  made  or  an  infeasible  relation 
is  detected. 

3.3  Examples 

(a)  Consider  a  set  of  five  tasks  as  shown  in  Figure  4.  The 
temporal  analysis  leads  us  to  the  following  set  of  prece¬ 
dence  relations,  sans  the  redundant  ones: 

{ro  — ^  r3,ri  — ►  r5,T3  — ►  ^  T5} 

The  set  of  precedence  relations  may  be  represented  as 
a  precedence  graph  (Figure  5)  and  impose  a  partial  or¬ 
der  on  the  task  set.  Only  sequences  which  are  consistent 


with  this  partial  order  need  to  be  considered  for  schedul¬ 
ing.  For  5  tasks,  the  total  number  of  permutations  is 
120(=  5!).  The  number  of  total  orders  consistent  with 
the  partial  order  of  Figure  5  is  12,  which  is  a  drastic  re¬ 
duction  in  the  number  of  sequences  that  need  to  be  con¬ 
sidered  for  scheduling.  The  modified  task  set  is  shown  in 
Figure  4(b),  with  the  modified  values  in  bold. 

(b)  As  another  example,  consider  the  set  of  4  tasks  as  shown 
in  Figure  6.  The  task  set  in  different  stages  of  temporal 
analysis  is  shown,  with  the  new  temporal  relations^  at 
each  stage.  This  example  shows  how  successive  refine¬ 
ment  of  temporal  relations  can  lead  to  detecting  infeasi¬ 
bility. 

3.4  Complexity  of  Temporal  Analysis 

It  is  easy  to  see  that  the  initial  set  of  relations  can  be  estab¬ 
lished  in  O(n^)  time.  Further,  each  phase  of  refinement  also 
takes  no  more  than  O(n^).  An  upper  bound  for  the  num¬ 
ber  of  phases  is  n.  Therefore,  the  worst  case  complexity  of 
temporal  analysis  is  O(n^).  In  practice,  however,  the  cost  of 
temporal  analysis  can  be  significantly  less  since  concurrent  re¬ 
lations  and  relations  between  non-overlapping  tasks  need  not 
be  generated  explicitly.  Furthermore,  the  number  of  phases 
required  to  stabilize  window  modification  can  be  reduced  if 
the  release  times  are  modified  in  the  topological  sort  order 
of  the  precedence  graph  and  deadlines  are  modified  in  the 
reverse  topological  sort  order. 

In  any  case,  the  cost  of  temporal  analysis  for  static  schedul¬ 
ing  is  not  significant  when  used  in  conjunction  with  an  expo¬ 
nential  time  scheduling  algorithm.  In  section  5,  we  show  em¬ 
pirically  that  the  cost  of  temporal  analysis  is  not  a  significant 
factor  for  static  sclieduling. 

4  Non  Preemptive  Scheduling  using 
Temporal  Analysis 

The  relations  established  through  temporal  analysis  serve  as  a 
basis  for  scheduling  of  the  tasks.  Temporal  analysis  may  thus 
be  perceived  as  a  pre-processing  stage  for  scheduling.  The 
result  of  this  pre-processing  stage  is  one  of  the  following: 

1.  The  task  set  was  detected  to  be  infeasible,  due  to  the 
existence  of  one  or  more  infeasible  relations. 

2.  A  set  of  precedence  relations  were  established  generat¬ 
ing  a  precedence  graph.  The  precedence  graph  imposes 
a  partial  order  on  the  set  of  tasks.  It  serves  as  an  in¬ 
put  to  the  scheduler  which  may  exploit  the  partial  order 
generated  to  prune  the  search  space. 

4.1  Detecting  Infeasibility 

Whenever,  an  infeasible  relation  exists  between  two  tasks,  it 
is  known  that  no  ordering  of  the  two  tasks  is  feasible.  Thus, 

^Concurrent  Relations  are  not  shown. 
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Figure  4:  Window  Modification:  (a)  Original  Task  Set  (b)  Task  Set  after  Temporal  Analysis 
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Figure  6:  Example  for  Determining  Infeasibility  with  Temporal  Analysis 


the  detection  of  an  infeasible  relation  at  any  stage  in  tem¬ 
poral  analysis  indicates  that  the  task  set  is  infeasible.  Even 
though  only  pairwise  schedul ability  analysis  is  used  for  estab¬ 
lishing  relations,  successive  refinement  of  relations  results  in 
a  possible  percolation  of  this  effect  to  other  tasks  too.  This 
effect  is  exemplified  in  the  example  of  Figure  6,  where  sev¬ 
eral  iterations  lead  to  a  infeasible  relation.  It  must  be  noted 
that  whenever  infeasibility  is  detected,  the  resulting  task  set 
and  their  relations  also  provide  a  good  feedback  as  to  what 
caused  it.  The  feedback  information  may  be  used  to  allo¬ 
cate  more  resources,  change  resource  allocation  or  allow  for 
selective  preemption  as  the  case  may  be. 

4.2  Search  Technique  for  Scheduling 

The  intractability  of  non-preemptive  scheduling  has  led  to 
implicit  enumeration  techniques  based  on  branch  and  bound 
search  methods.  The  search  space  is  the  set  of  all  possible 
permutation  sequences.  One  way  of  enumerating  schedules  is 
to  generate  an  initial  schedule  and  then  successively  refine  it 
using  heuristics  to  generate  ‘"better”  schedules,  until  a  feasible 
schedule  is  obtained  [3,  17,  16]. 

In  this  paper,  we  concentrate  on  another  enumeration 
method  which  constructs  a  schedule  in  an  incremental  man¬ 
ner.  Variants  of  this  method  have  been  used  in  [4,  18,  19, 
20,  21]  The  search  space  is  represented  as  a  search  tree.  The 


root  (level  0)  of  the  tree  is  an  empty  schedule.  The  nodes 
of  the  tree  represent  partial  schedules.  A  node  at  level  k 
gives  a  partial  schedule  with  k  tasks.  The  leaves  are  complete  ' 
schedules.  The  successors  of  an  intermediate  node  are  zmme-  | 
diaie  exiensions  of  the  partial  sdiedule  corresponding  to  that 
node.  From  a  node  at  level  k,  there  are  at  most  n  —  k  branches 
with  each  branch  corresponding  to  an  extension  of  the  partial 
schedule  by  appending  one  more  task  to  the  schedule.  Search  i 
is  done  in  a  branch  and  bound  manner,  wherein  parts  of  the 
search  tree  are  pruned  when  it  is  determined  that  no  feasible 
schedule  can  arise  from  them.  For  eacli  node  being  expanded, 
the  following  conditions  must  hold. 

1.  All  immediate  extensions  of  the  node  must  be  feasible 
[4.  18]. 

2.  The  remaining  computational  demand  must  not  exceed 
the  difference  between  the  largest  deadline  of  remaining 
tasks  and  current  scheduling  time  [4]. 

If  any  condition  is  violated  then  no  feasible  schedule  can 
be  generated  in  the  subtree  originating  from  this  node.  No 
search  is  conducted  on  the  subtree  rooted  at  such  a  node. 

4.2.1  Heuristically  Guided  Sclieduling 

Heuristics  are  commonly  used  to  guide  search  in  many  combi¬ 
natorial  searching  problems.  For  non-preemptive  scheduling 


heuristics  may  be  used  to  guide  search  along  paths  which  are 
more  likely  to  lead  feasible  schedules.  Search  is  done  in  a 
depth  first  manner  until  either  a  complete  feasible  schedule  is 
found,  in  which  case  the  search  terminates,  or  it  is  determined 
that  no  possible  extensions  of  the  current  node  can  lead  to  a 
feasible  schedule.  Heuristics  are  used  to  determine  which  of 
the  many  children  of  a  node  should  be  searched  next.  Back¬ 
tracking  takes  place  when  no  further  extensions  of  a  node  can 
be  made.  We  evaluate  temporal  analysis  using  such  a  heuris¬ 
tic  search  for  scheduling. 

5  Empirical  Evaluation  of  Temporal 
Analysis 

In  the  previous  sections,  we  have  shown  how  temporal  anal¬ 
ysis  may  be  used  to  restrict  the  search  space  for  scheduling. 
Clearly,  the  existence  of  even  a  few  precedence  relations  re¬ 
sults  in  a  drastic  reduction  of  search  space^.  However,  the 
usefulness  of  the  scheme  is  not  obvious  since  we  are  only  in¬ 
terested  in  feasible  scliedules,  hence  a  large  part  of  the  search 
space  may  never  need  to  be  examined.  We  have  conducted 
various  simulations  to  verify  that  indeed  temporal  analysis 
results  in  improved  performance  for  scheduling.  For  reasons 
of  space,  we  mention  only  a  few  significant  results. 

We  used  a  heuristic  search  technique  for  scheduling  as  de¬ 
scribed  in  section  4.2.  The  heuristic  used  for  our  simulation 
study  was  a  two  level  heuristic.  The  primary  heuristic  was 
earliest  start  izme(EST). 

ESTi  =  max{ri,fk) 

where  k  is  the  last  task  in  the  partial  schedule  at  that  node. 

In  the  case  of  a  conflict,  the  secondary  heuristic  earliest 
deadline  was  used.  Further  conflicts  were  resolved  arbitrarily. 
The  heuristic  has  a  natural  intuitive  appeal  and  is  known  to 
produce  good  results  among  linear  heuristics[22]. 

For  each  set  of  parameters,  we  generated  200  “feasible” 
task  sets  with  100  tasks  each.  The  task  sets  were  gener¬ 
ated  with  100%  utilization  as  this  presents  the  most  difficulty 
for  sclieduling.  The  computation  times  were  generated  using 
uniform  distributions  and  laxities  using  normal  distribution. 
We  compared  the  success  percentage  (i.e.  percentage  of  suc¬ 
cessfully  scheduled  task  sets)  of  scheduling  with  and  without 
temporal  analysis  as  a  pre-processing  stage.  The  success  per¬ 
centage  (SP)  is  plotted  against  “cut-off-time” ,  indicating  the 
maximum  time  allowed  to  the  scheduling  algorithm  to  suc¬ 
cessfully  generate  a  schedule. 

Our  simulation  results  show  that  temporal  analysis  is  not 
needed  for  scheduling  when  both  the  mean  and  the  variation 
in  laxities  is  low  since  the  simple  heuristics  were  able  to  sched¬ 
ule  almost  all  task  sets  (success  ratio  «  1.0).  However,  when 
the  laxities  are  high  (as  compared  to  computation  times)  and 
the  variation  in  laxities  is  also  high"^,  then  the  heuristics  do 

^  Even  one  relation  reduces  the  search  space  by  half. 

^Note  that  the  task  set  utilization  is  100% 


SP 


Figure  7:  Success  Ratio  vs  Cut-off-Time,  //^  =  5.0//c, 
COVc  =  1.0 

not  perform  as  well  and  the  use  of  temporal  analysis  results 
in  10  —  20%  improvement  in  success  ratio. 

As  an  illustration,  we  show  a  few  plots  which  plot  the 
success  percentage  (SP)  of  scheduling  with  temporal  analy¬ 
sis  (TAS)  contrasted  with  success  percentage  of  sclieduling 
without  temporal  analysis,  i.e.  the  baseline  scheduling  model 
(BM).  For  scheduling  with  temporal  analysis,  we  consider  two 
cases,  one  in  which  overhead  of  temporal  analysis  is  added  to 
scheduling  time  (TAS+)  and  the  other  in  which  it  is  not  (TAS- 
).  The  parameters  varied  are  the  mean  laxity  fic  in  terms  of 
mean  computation  times  /iCj  cind  the  coefficient  of  variation 
for  laxity  COVc  •  Figures  7  and  8  show  the  plots  for  low 
laxity  mean  with  low  and  high  variation.  For  this  case,  there 
is  no  significant  performance  improvement  due  to  temporal 
analysis  and  both  schemes  achieve  almost  100%  success  per¬ 
centage.  On  the  other  hand  when  the  average  laxity  is  high 
(Figures  9  and  10),  coupled  with  high  variation,  we  see  that 
temporal  analysis  results  in  significant  improvement  in  per¬ 
formance.  The  plots  also  show  that  the  curves  for  (TAS+) 
and  (TAS-)  are  almost  identical  showing  that  the  overhead  of 
temporal  analysis  is  minimal  when  compared  to  the  schedulng 
costs. 

6  Concluding  Remarks 

In  this  paper  we  have  presented  temporal  analysis  as  a  tech¬ 
nique  for  analyzing  the  timing  relationships  among  a  set  of 
tasks  to  establish  constraints  on  scheduling  which  are  dis¬ 
cernible  from  a  pairwise  analysis.  The  implications  and  the 
benefits  of  the  approach  as  a  pre-processing  stage  for  schedul¬ 
ing  has  been  shown  through  examples  and  simulation. 

Time  Driven  Scheduling  theory  has  relied  heavily  on  search 
techniques  for  scheduling  and  little  work  has  been  done  in 
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Figure  8:  Success  Ratio  vs  Cut-off-Time,  /i£ 
COV^  =  2.0 


5.0/iCj  Figure  9:  Success  Percentage  vs  Cut-off-Time,  /ic  =  lO.O/xc, 

COVc  =  1.0 


developing  analytic  techniques.  Temporal  analysis  is  a  step 
in  this  direction  and  provides  an  efficient  way  of  analyzing  a 
task  set  and  deducing  valuable  information  for  scheduling. 

The  existence  of  an  infeasible  relation  in  a  task  set  gives 
a  sufficient  condition  for  infeasibility.  This  provides  an  early 
test  for  infeasibility,  which  can  then  be  used  as  a  basis  for 
selective  preemption  to  enhance  feasibility.  Alternatively,  the 
detection  of  infeasibility  may  be  used  to  allocate  more  re¬ 
sources  or  change  resource  allocation. 

The  precedence  relations  generated  as  a  result  of  temporal 
analysis  impose  a  partial  order  on  the  task  set  and  may  be 
effectively  used  to  prune  the  search  space  for  scheduling.  Our 
simulations  confirm  that  temporal  analysis  helps  in  improving 
the  performance  of  a  scheduling  algorithm  without  incurring 
a  significant  overhead.  In  the  simplest  scheduling  case,  when 
heuristics  perform  very  well,  temporal  analysis  might  be  per¬ 
ceived  as  a  way  of  formalizing  the  heuristics.  For  static  time 
driven  scheduling  to  be  a  feasible  technique,  it  becomes  im¬ 
perative  that  the  scheduling  cost  be  controlled  as  the  size  of 
the  problem  increases.  Temporal  analysis  provides  a  step  in 
the  right  direction. 

In  this  paper  we  have  been  concerned  with  single  proces¬ 
sor  scheduling.  An  interesting  extension  of  temporal  analysis 
would  be  to  use  it  for  multi-processor  scheduling.  One  way 
to  extend  the  analysis  to  multi-processor  scheduling  is  to  per¬ 
form  it  in  two  phases.  In  the  first  phase  the  infeasible  and 
concurrent  relations  may  be  used  to  obtain  an  allocation  of 
tasks  to  processors.  Then  in  the  second  phase,  the  analy¬ 
sis  shown  in  this  paper  can  be  used  for  each  processor  for 
scheduling. 

Many  real-time  system  specifications  impose  relative  tim¬ 
ing  constraints  on  the  tasks[23,  24].  In  this  paper,  we  have 
restricted  ourselves  to  absolute  constraints  on  the  start  and 
finish  times  of  tasks.  When  more  complex  constraints  are 


imposed  on  tasks,  the  role  of  temporal  analysis  in  reducing 
the  search  space  becomes  even  more  important  since  simple 
heuristics  are  unlikely  to  perform  well.  It  would  be  interest¬ 
ing  to  see  how  temporal  analysis  can  be  extended  to  use  sucli 
constraints  to  further  prune  the  search  space. 

We  are  currently  implementing  a  scheduling  tool  based  on 
the  results  shown  in  this  paper.  The  tool  is  being  developed 
for  the  MARUTI  project,  an  experimental  real-time  system 
prototype  being  developed  at  the  University  of  Maryland, 
based  on  the  concept  of  pre-scheduling[6]. 
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Abstract 

A  simple  approach  to  inter-domain  routing  is  domain-level  source  routing  with  link-state 
approach  where  each  node  maintains  a  domain-level  view  of  the  internetwork.  This  does  not  scale 
up  to  large  internetworks.  The  usual  scaling  technique  of  aggregating  domains  into  super domadns 
loses  ToS  and  policy  detail. 

We  present  a  new  viewserver  hierarchy  and  associated  protocols  that  (1)  satisfies  policy 
3Jid  ToS  constraints,  (2)  adapts  to  dynamic  topology  changes  including  failures  that  partition 
domains,  and  (3)  scales  well  to  large  number  of  domains  without  losing  detail.  Domain-level 
views  are  maintained  by  special  nodes  called  viewservers.  Each  viewserver  maintains  a  domain- 
level  view  of  a  surrounding  precinct.  Viewservers  are  organized  hierarchically.  To  obtain  domain- 
level  source  routes,  the  views  of  one  or  more  viewservers  axe  merged  (up to  a  maximum  of  twice 
the  levels  in  the  hierarchy). 

We  also  present  a  model  for  evaluating  inter-domain  routing  protocols,  and  apply  this  model 
to  compare  our  viewserver  hierarchy  against  the  simple  approach.  Our  results  indicate  that  the 
viewserver  hierarchy  finds  many  short  valid  paths  and  reduces  the  amount  of  memory  require¬ 
ment  by  two  orders  of  magnitude. 
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tecture  and  Design— pacArei  ntiworks;  store  and  forward  networks]  C.2.2  [Computer- Communication  Netr- 
works]:  Network  Protocols— proloco/  architecture]  C.2.m  [Routing  Protocols];  F.2.m  [Computer  Network 
Routing  Protocols]. 
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1  Introduction 


A  computer  internetwork,  such  as  the  Internet,  is  an  interconnection  of  backbone  networks,  regional 
networks,  metropolitan  area  networks,  and  stub  networks  (campus  networks,  office  networks  and 
other  small  networks)^.  Stub  networks  are  the  producers  and  consumers  of  the  internetwork  traffic, 
while  backbones,  regionals,  and  MANs  are  transit  networks.  (Most  of  the  networks  in  an  internet¬ 
work  are  stub  networks.)  Each  network  consists  of  nodes  (hosts,  routers)  and  links.  Two  networks 
are  neighbors  when  there  is  one  or  more  links  between  nodes  in  the  two  networks  (see  Figure  1). 


Figure  1:  A  portion  of  an  internetwork.  (Circles  represent  stub  networks.) 

An  internetwork  is  organized  into  domain^.  A  domain  is  a  set  of  networks  (possibly  consisting  of 
only  one  network)  administered  by  the  same  agency.  Within  each  domain,  an  intra-domain  routing 
protocol  is  executed  that  provides  routes  between  source  and  destination  nodes  in  the  domain.  This 
protocol  can  be  any  of  the  typical  ones,  i.e.,  next-hop  or  source  routes  computed  using  distance- 
vector  or  link-state  algorithms. 

Across  all  domains,  an  inter-domain  routing  protocol  is  executed  that  provides  routes  be¬ 
tween  source  and  destination  nodes  in  different  domains.  This  protocol  must  satisfy  various  con¬ 
straints: 

(1)  It  must  satisfy  policy  constraints,  which  are  administrative  restrictions  on  the  inter-domain 

traffic  [8,  12,  9,  5].  Policy  constraints  are  of  two  types:  transit  policies  and  source  policies. 

The  transit  pohcies  of  a  domain  A  specify  how  other  domains  can  use  the  resources  of  A 

(e.g.  $0.01  per  packet,  no  traffic  from  domain  B).  The  source  policies  of  a  domain  A  specify 

'  For  example,  NSFNET,  MILNET  aie  backbones  and  Snranet,  CeifNet  are  regionals. 

^  also  referred  to  as  routing  domains 


coasiraints  on  traffic  originating  from  A  (e.g.  domains  to  avoid/prefer,  acceptable  connection 
cost).  Transit  policies  of  a  domain  are  public  (i.e.  available  to  other  domains),  whereas  source 
policies  are  usually  private. 

(2)  An  inter-domain  routing  protocol  must  also  satisfy  iype-of-service  (ToS)  constraints  of  ap¬ 
plications  (e.g.  low  delay,  high  throughput,  high  reliability,  miriiTnnTn  monetary  cost).  To  do 
this,  it  must  keep  track  of  the  types  of  services  ojffered  by  each  domain  [5]. 

(3)  Inter-domain  routing  protocols  must  scale  up  to  very  large  internetworks,  i.e.  with  a  very  large 
number  of  domains.  Practically  this  means  that  processing,  memory  and  communication 
requirements  should  be  much  less  than  linear  in  the  number  of  domains. 

(4)  Inter-domain  routing  protocols  must  automatically  adapt  to  link  cost  changes,  node/hnk 
failures  and  repairs  including  failures  that  partition  domains  (15].  It  must  also  handle  non- 
hierarchicaJ  domain  interconnections  at  any  level  [9]  (e.g.  we  do  not  want  to  hand-conhgure 
spedal  routes  as  “back-doors”). 

A  siTTiple  (or  straightforward)  approach  to  inter-domain  routing  is  domain-level  source  routing 
with  link-state  approach  [8, 5].  In  this  approach,  each  router^  maintains  a  domain-level  view  of  the 
internetwork,  i.e.,  a  graph  with  a  vertex  for  every  domain  and  an  edge  between  every  two  neighbor 
domains.  Policy  and  ToS  information  is  attached  to  the  vertices  and  the  edges  of  the  view. 

When  a  source  node  needs  to  reach  a  destination  node,  it  (or  a  router^  in  the  source’s  domain) 
first  examines  this  view  and  determines  a  domain-level  source  route  satisfying  ToS  and  poHcy 
constraints,  i.e.,  a  sequence  of  domain  ids  starting  from  the  source’s  domain  and  ending  with  the 
destination’s  domain.  Then,  the  packets  are  routed  to  the  destination  using  this  domain-level 
source  route  and  the  intra-domain  routing  protocols  of  the  domains  crossed. 

The  disadvantage  of  this  simple  scheme  is  that  it  does  not  scale  up  for  large  internetworks.  The 
storage  at  each  router  is  proportional  to  Nd  x  Ep,  where  Njd  is  the  number  of  domains  and  Ed 
is  the  average  number  of  neighbor  domains  to  a  domain.  The  communication  cost  is  proportional 
to  Nr  X  Er,  where  Nr  is  the  number  of  routers  in  the  intemetwork  and  Er  is  the  average  router 
neighbors  of  a  router  (topology  changes  are  flooded  to  all  routers  in  the  internetwork). 

To  achieve  scaling,  several  approaches  based  on  aggregating  domains  into  superdomains  have 

®  Not  a]l  nodes  maintain  routing  tables.  A  router  is  a  node  that  maintains  a  routing  table. 

referred  to  as  the  policy  server  in  [8] 


been  proposed  [13,  16,  6].  This  approaches  have  drawbacks  because  the  aggregation  results  in  loss 
of  detail  (discussed  in  Section  2). 

Our  protocol 

In  this  paper,  we  present  an  inter-domain  routing  protocol  that  we  have  proposed  recently [3].  It 
combines  domain-level  views  with  a  novel  hierarchical  scheme.  It  scales  well  to  large  internetworks, 
and  does  not  suffer  from  the  problems  of  superdomains. 

In  our  scheme,  domain-level  views  are  not  maintained  by  every  router  but  by  special  nodes 
called  viewservers.  For  each  viewserver,  there  is  a  subset  of  domains  around  it,  referred  to  as  the 
viewserver’s  precinct  The  viewserver  maintains  the  domain-level  view  of  its  precinct.  This  solves 
the  scaling  problem  for  storage  requirement. 

A  viewserver  can  provide  domain-level  source  routes  between  source  and  destination  nodes  in 
its  precinct.  Obtaining  a  domain-level  source  route  between  a  source  and  a  destination  that  are 
not  in  any  single  view,  involves  accumulating  the  views  of  a  sequence  of  viewservers.  To  make  this 
process  efficient,  viewservers  are  organized  hierarchically  in  levels,  and  an  associated  addressing 
structure  is  used.  Each  node  has  a  set  of  addresses.  Each  address  is  a  sequence  of  viewserver  ids  of 
decreasing  levels,  starting  at  the  top  level  and  going  towards  the  node.  The  idea  is  that  when  the 
views  of  the  viewservers  in  an  address  are  merged,  the  merged  view  contains  domain-level  routes 
to  the  node  from  the  top  level  viewservers.  (Addresses  are  obtained  from  name  servers  in  the  same 
way  as  is  currently  done  in  the  Internet.) 

We  handle  dynamic  topology  changes  such  as  node/link  failures  and  repairs,  link  cost  changes, 
and  domain  partitions.  Gateways^  detect  domain-level  topology  changes  afFecting  its  domain  and 
neighbor  domains.  For  each  domain,  there  is  a  reporting  gateway  that  communicates  these  changes 
by  flooding  to  the  viewservers  in  a  spedfled  subset  of  domains;  this  subset  is  referred  to  as  its  flood 
area.  Hence,  the  number  of  packets  used  during  flooding  is  proportional  to  the  size  of  the  flood 
area.  This  solves  the  scaling  problem  for  the  communication  requirement. 

Thus  our  inter-domain  routing  protocol  consists  of  two  subprotocols:  a  view-query  proto¬ 
col  between  routers  and  viewservers  for  obtaining  merged  views;  and  a  view- up  date  protocol 
between  gateways  and  viewservers  for  updating  domain-level  views. 
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A  node  is  called  a.  gateway  if  it  has  a  linV  to  another  domain. 


Evaluation 


Many  inter- domain  routing  protocols  have  been  proposed,  based  on  various  kinds  of  hierarchies. 
How  do  these  protocols  compare  against  each  other  and  against  the  simple  approach?  To  answer  this 
question,  we  need  a  model  in  which  we  can  define  internetwork  topologies,  policy/ToS  constraints, 
inter-domain  routing  hierarchies,  and  evaluation  measures  (e.g.  memory  and  time  requirements) 
for  inter-domain  routing  protocols.  None  of  these  protocols  have  been  evaluated  in  a  way  that  they 
can  be  compared  against  each  other  or  the  simple  approach. 

In  this  paper,  we  present  such  a  model,  and  use  it  to  compare  our  viewserver  hierarchy  to  the 
simple  approach.  Our  evaluation  measures  are  the  amount  of  memory  required  at  the  source  and 
at  the  routers,  the  amount  of  time  needed  to  construct  a  path,  and  the  number  of  valid  paths 
found  (and  their  lengths)  in  comparison  to  the  number  of  available  valid  paths  (and  their  lengths) 
in  the  internetwork.  We  use  three  internetwork  topologies  each  of  size  11,110  domains  (roughly  the 
current  size  of  the  Internet).  Our  results  indicate  that  the  viewserver  hierarchy  finds  many  short 
valid  paths  and  reduces  the  amount  of  memory  requirement  by  two  orders  of  magnitude. 

Organization  of  the  paper 

In  Section  2,  we  survey  recent  approaches  to  inter-domain  routing.  In  Section  3,  we  present  the 
view-query  protocol  for  static  network  conditions,  that  is,  assuming  all  links  and  nodes  of  the 
network  remain  operational.  In  Section  4,  we  present  the  view-update  protocol  to  handle  topology 
changes  (this  section  is  not  needed  for  the  evaluation  part).  In  Section  5,  we  present  our  evaluation 
model  and  results  from  its  application  to  the  viewserver  hierarchy.  In  Section  6,  we  conclude  and 
describe  how  to  add  fault- tolerance  and  cacheing  schemes  to  improve  performance. 

2  Related  Work 

In  this  section,  we  survey  recently  proposed  inter-domain  routing  protocols  that  support  ToS  and 
Policy  routing  for  large  internetworks  [14,  16,  13,  10,  6,  20,  2,  19, 18,  7]. 

Several  inter-domain  routing  protocols  (e.g.  BGP  [14],  IDRP  [16],  NR  [10])  are  based  on  path- 
vector  approach  [17].  Here,  for  each  destination  domain  a  router  maintains  a  set  of  paths,  one 
through  each  of  its  neighbor  routers.  ToS  and  policy  information  is  attached  to  these  paths.  Each 


router  requires  0{Nd  x  Nd  x  Er)  space.  For  each  destination,  a  router  exchanges  its  best  vahd 
path®  with  its  neighbor  routers.  However,  a  path-vector  algorithm  may  not  find  a  valid  path 
from  a  source  to  the  destination  even  if  such  a  route  exists  [13]'.  By  exchanging  k  paths  to  each 
destination,  the  probability  of  detecting  a  valid  path  for  each  source  can  be  increased. 

The  most  common  approach  to  solve  the  scaling  problem  is  to  use  superdomain^  (e.g.  IDPR  [13], 
IDEP  [16],  Nimrod  [6]).  Superdomains  extend  the  idea  of  area  hierarchy  [11].  Here,  domains  are 
grouped  hierarchically  into  superdomains:  “close”  domains  are  grouped  into  level  1  superdomains, 
“close”  level  1  superdomains  are  grouped  into  level  2  superdomains,  and  so  on.  Each  domain 
A  is  addressed  by  concatenating  the  superdomain  ids  starting  from  a  top  level  superdomain  and 
going  down  towards  A.  A  router  maintains  a  view  that  contains  the  domains  in  the  same  level  1 
superdomain,  the  level  1  superdomains  in  the  same  level  2  superdomain,  and  so  on.  Thus  a  router 
maintains  a  smaller  view  than  it  would  in  the  absence  of  hierarchy.  Each  superdomain  has  its  own 
ToS  and  policy  constraints  derived  from  that  of  the  subdomains. 

There  axe  several  major  problems  with  using  superdomains.  One  problem  is  that  if  there  are 
domains  with  different  (possibly  contradictory)  constraints  in  a  superdomain,  then  there  is  no  good 
way  of  deriving  the  ToS  and  pohcy  constraints  of  the  superdomain.  The  usual  techniques  are  to 
take  either  the  union  or  the  intersection  of  the  constraints  of  the  subdomains  [13].  Both  techniques 
have  problems®.  Other  problems  axe  described  in  [6,  2].  Some  of  the  problems  can  be  relaxed  by 
having  overlapping  superdomains,  but  this  increases  the  storage  requirements  drastically. 

Nimrod  [6]  and  BDPR  [13]  use  the  link-state  approach,  domain-level  source  routing,  and  super¬ 
domains  (non-overlapping  superdomains  for  Nimrod).  EDRP  [16]  uses  path- vector  approach  and 
superdomains. 

Reference  [10]  combines  the  benefits  of  path-vector  approach  and  link-state  approach  by  having 
two  modes:  An  NR  mode,  which  is  an  extension  of  DDRP  and  is  used  for  the  most  common  ToS 
and  pohcy  constraints;  and  a  SDR  mode,  which  is  like  BDPR  and  is  used  for  less  frequent  ToS  and 

^  A  valid  path  is  a  path  that  satisfies  the  ToS  and  policy  constraints  of  the  domains  in  the  path. 

^  For  example,  suppose  a  router  u  has  two  paths  Pi  and  P2  to  the  destination.  Let  u  have  a  router  neighbor  v, 
which  is  in  another  domain,  u  chooses  and  informs  v  of  one  of  the  paths,  say  Pi.  But  Pi  may  violate  source  policies 
of  v’s  domain,  and  P2  may  be  a  valid  path  for  t>. 

*  also  referred  to  as  rooting  domain  confederations 

®  For  example,  if  the  union  is  taken,  then  a  subdomain  A  can  be  forced  to  obey  constraints  of  other  subdomains; 
this  may  eliminate  a  path  through  A  which  is  otherwise  valid.  If  the  intersection  is  taken,  then  a  subdomain  A  can 
be  forced  to  accept  traffic  it  would  otherwise  not  accept. 


policy  requests.  This  study  does  not  address  the  scalability  of  the  SDR  mode. 

In  [2],  we  proposed  another  protocol  based  on  superdomains.  It  always  finds  a  valid  path  if 
one  exists.  Both  union  and  intersection  policy  and  ToS  constraints  are  maintained  for  each  visible 
superdomain.  If  the  union  policy  constraints  of  superdomains  on  a  path  are  satisfied,  then  the  path 
is  valid.  If  the  intersection  policy  constraints  of  a  superdomain  axe  satisfied  but  the  union  policy 
constraints  axe  not,  the  source  uses  a  query  protocol  to  obtain  a  more  detailed  ^internal”  view  of 
the  superdomain,  and  searches  again  for  a  valid  path.  The  protocol  uses  a  Hnk-state  view  update 
protocol  to  handle  topology  changes,  including  failures  that  partition  superdomains  at  any  level. 

The  landmark  hierarchy  [19,  18]  is  another  approach  for  solving  the  scaling  problem.  Here, 
each  router  is  a  landmark  with  a  radius,  and  routers  which  are  within  a  radius  away  from  the 
landmark  maintain  a  route  to  it.  Landmarks  are  organized  hierarchically,  such  that  the  radius 
of  a  landmark  increases  with  its  level,  and  the  radii  of  top  level  landmarks  include  all  routers. 
Addressing  and  packet  forwarding  schemes  are  introduced.  Link-state  algorithms  can  not  be  used 
with  the  landmark  hierarchy,  and  a  thorough  study  of  enforcing  ToS  and  policy  constraints  with 
this  hierarchy  has  not  been  done. 

The  landmark  hierarchy  may  look  similar  to  our  viewserver  hierarchy,  but  in  fact  they  are  quite 
the  landmark  hierarchy,  nodes  within  the  radius  of  the  landmark  maintain  a  route  to 
the  landmark,  and  the  landmark  may  not  have  a  route  to  these  nodes.  In  the  viewserver  hierarchy, 
viewserver  maintains  routes  (i.e.  a  view)  to  the  nodes  in  its  precinct. 

Route  fragments  [7]  is  an  addressing  scheme.  A  destination  route  fragment,  called  a  route 
suffix^  is  a  sequence  of  domain  ids  from  a  backbone  to  the  destination  domain.  A  source  route 
fragment,  caBed  a  route  prefix^  is  the  reverse  of  a  route  suffix  of  that  domain.  There  are  also  route 
middles^  which  are  from  transit  domains  to  transit  domains.  These  addresses  are  static  (i.e.  they 
are  not  updated  with  topology  changes)  and  stored  at  the  name  servers.  A  source  queries  a  name 
server  and  obtains  destination  route  suffixes.  It  then  chooses  an  appropriate  route  suffix  for  the 
destination  and  concatenates  it  with  its  own  route  prefix  (and  uses  routes  middles  if  route  suffix 
and  route  prefix  do  not  intersect).  This  scheme  can  not  handle  topology  changes  and  does  not 
address  handling  policy  and  ToS  constraints. 


3  Viewserver  Hieraxchy  Query  Protocol 


Ld  this  section,  we  present  our  scheme  for  static  network  conditions,  that  is,  ail  links  and  nodes 
remain  operational.  The  dynamic  case  is  presented  in  Section  4. 

Conventions:  Each  domain  has  a  unique  id.  Domainids  denotes  the  set  of  domain-ids.  Each 
node  has  an  id  which  is  unique  in  its  domain.  Nodelds  denotes  the  set  of  node-ids.  Thus,  a  node  is 
totally  identified  by  the  combination  of  its  domain’s  id  and  its  node-id.  Totallds  denotes  the  set 
of  total  node-ids.  For  a  node  u,  we  use  domainid(u)  to  denote  the  domain-id  of  u’s  domain.  "We 
use  nodeid(u)  and  totalid{u)  to  denote  the  node-id  and  total-id  of  u  respectively.  For  a  domain  A, 
we  use  doTnainid(A)  to  denote  the  domain-id  of  A.  NodeNeighbors(u)  denotes  the  set  of  node-ids 
of  the  neighbors  of  u.  DomainNeighbor^A)  denotes  the  set  of  domain-ids  of  the  domain  neighbors 
of  A.  We  use  the  term  gateway-id  (or  viewserver-id)  to  mean  the  total-id  of  a  gateway  node  (or  a 
viewserver  node). 

In  our  protocol,  a  node  u  uses  two  kinds  of  sends.  The  first  kind  has  the  form  “Send(m)  to  v”, 
where  m  is  the  message  being  sent  and  v  is  the  total-id  of  the  destination.  Here,  nodes  v.  and  v 
are  neighbors,  and  the  message  is  sent  over  the  physical  link  (u,  u).  If  the  link  is  down,  we  assume 
that  the  packet  is  dropped. 

The  second  kind  of  send  has  the  form  “Send(m)  to  v  using  d/sr”,  where  m  and  v  axe  as  above 
and  dlsT  is  a  domain-level  source  route  between  u  and  v.  Here,  the  message  is  sent  using  the  intra- 
domain  routing  protocols  of  the  domains  in  dlsr  to  reach  We  assume  that  as  long  as  there  is  a 
sequence  of  up  links  connecting  the  domains  in  dlsr,  the  message  is  delivered  to  If  the  u  and 
V  are  in  the  same  domain,  dlsr  equals  (). 

Views  and  Viewservers 

Domain-level  views  are  maintained  by  special  nodes  called  viewservers.  Each  viewserver  has  a 
precinct,  which  is  a  set  of  domains  around  the  viewserver,  and  a  static  view,  which  is  a  domain-level 
view  of  the  precinct  and  outgoing  edges.  The  static  view  includes  the  ToS  and  policy  constraints 

Recall  that  given  a  domain-level  source  route  to  a  destination,  using  the  intra-domain  routing  protocols  we  can 
reacL  the  destination. 

This  involves  time-outs,  retransmissions,  etc.  It  requires  a  transport  protocol  support  such  as  TCP. 


of  domains  in  the  precinct  and  of  domain-level  edges^^.  Formally,  a  viewserver  x  maintains  the 
following: 

Precincix^  (C  Domainids).  Domain-ids  whose  view  is  maintained. 

SVieWx^  Static  view  of  x. 

=  {{A,  policyktos{A),  {(5,  edg€^policyktos{A,  B)) :  B  €  subset  of  DomainNeighbor${A)}) 
A  G  Precinctx} 

SVieWx  can  be  implemented  as  adjacency  list  representation  of  graphs  [l].  The  intention  of 
SVieWx  is  to  obtain  domain-level  source  routes  between  nodes  in  Precincix.  Hence,  the  choice  of 
domains  to  include  in  Precinctx  and  the  choice  of  neighbors  of  domains  to  include  in  SVieWx  is 
not  arbitrary.  Precincix  and  SView^  must  be  connected;  that  is,  between  any  two  domains  in 
Precinctx^  there  should  be  a  path  in  SView^  that  lies  in  Precinctx-  Note  that  SView^  can  contain 
edges  to  domains  outside  Precinctx-  We  say  that  a  domain  A  is  in  the  view  of  a  viewserver  x,  if 
either  A  is  in  the  precinct  of  x  or  SView^  has  an  edge  from  a  domain  in  precinct  to  A.  Note  that 
the  precincts  and  views  of  different  view  servers  can  be  overlapping,  identical  or  disjoint. 

K  there  is  a  viewserver  x  whose  view  contains  both  the  source  and  the  destination  domains, 
then  x’s  view  can  be  used  to  obtain  the  required  domain-level  source  route  to  reach  the  destination. 
The  source  needs  to  reach  x  to  obtain  its  view.  If  the  source  and  x  are  in  the  same  domain,  x 
can  be  reached  using  the  intra-domain  routing  protocol.  If  x  is  in  another  domain,  then  the  source 
needs  to  have  a  domain-level  source  route  to  it^^.  In  this  case,  we  assume  that  source  has  a  set  of 
fixed  domain-level  source  routes  to  x. 

Viewserver  Hierarchy 

For  scaling  reasons,  we  cannot  have  one  large  view.  Thus,  obtaining  a  domain-level  source  route 
between  a  source  and  a  destination  which  axe  far  away,  involves  accumulating  views  of  a  sequence  of 
viewservers.  To  keep  this  process  efficient,  we  organize  viewservers  hierarchically.  More  precisely, 
each  viewserver  is  assigned  a  hierarchy  level  from  0, 1, . . .,  with  0  being  the  top  level  in  the  hierarchy. 
A  parent /child  relationship  between  viewservers  is  defined  as  follows: 

Not  all  the  domain-level  edges  need  to  be  included.  This  is  because  some  domains  may  have  many  neighbors 
caxising  a  big  storage  requirement. 

We  cannot  obtain  this  domain-level  source  route  from  x,  i.e.  chicken-egg  problem. 


1.  Every  level  i  viewserver,  i  >  0,  has  a  parent  viewserver  whose  level  is  less  thaa*i. 

2.  If  viewserver  r  is  a  parent  of  viewserver  y  then  x’s  view  contains  y’s  domain  and  y^s  view 
contains  x^s  domain^^. 

3.  The  view  of  a  top  level  viewserver  contains  the  domains  of  all  other  top  level  viewservers. 
(typically,  top  level  viewservers  are  placed  in  backbones). 

Note  that  the  second  constraint  does  not  mean  that  all  top  level  viewservers  have  the  same  view. 
In  the  hierarchy,  a  parent  can  have  many  children  and  a  child  can  have  many  parents.  We  extend 
the  range  of  the  parent-child  relationship  to  ordinary  nodes;  that  is  if  the  Precinct^  contains  the 
domain  of  node  tt,  we  say  that  is  a  child  of  x,  and  x  is  a  parent  of  u  (note  that  an  ordinary  node 
does  not  have  a  child).  We  assume  that  there  is  at  least  one  parent  viewserver  for  each  node. 

For  a  node  an  address  is  defined  to  be  a  sequence  (a:o?2:i,...,Xf)  such  that  X{  for.i  <  t  is 
a  viewserver-id,  xq  is  a  top  level  viewserver-id,  Zt  is  the  total-id  of  ti,  and  X{  is  a  parent  of  Xi+i. 
Note  that  a  node  may  have  many  addresses  since  the  parent-child  relationship  is  many-to-many.  If 
a  source  wants  a  domain-level  source  route  to  a  destination,  it  first  queries  the  name  servers  to 
obtain  a  set  of  addresses  for  the  destination.  Then,  it  queries  viewservers  to  obtain  an  accumulated 
view  containing  both  its  domain  and  the  destination’s  domain. 

Querying  the  name  servers  can  be  done  the  same  way  it  is  done  currently  in  the  Internet.  It 
requires  nodes  to  have  a  set  of  fixed  addresses  to  name  servers.  This  is  also  sufficient  in  our  case. 
However,  we  can  improve  the  performance  by  having  a  set  of  fixed  domain-level  source  routes 
instead. 

View-Query  Protocol:  Obtaining  Domain-Level  Source  Routes 

We  now  describe  how  a  domain-level  source  route  is  obtained  (regardless  of  whether  the  source  and 
the  destination  are  in  a  common  view  or  not). 

We  want  a  sequence  of  viewservers  whose  merged  views  contains  both  the  source  and  the 
destination  domains.  Addresses  provide  a  way  to  obtain  such  a  sequence,  by  first  going  up  in 
the  viewserver  hierarchy  starting  from  the  source  node  and  then  going  down  in  the  viewserver 
hierarchy  towards  the  destination  node.  More  precisely,  let  (sq,  . .  .,5f)  be  an  address  of  the  source, 
Note  thit  z  and  y  do  not  have  to  be  in  each  other’s  precinct. 

In  fact,  name  servers  are  caJled  domain  name  servers.  However,  domain  names  and  the  domains  used  in  this 
paper  are  diiferent.  We  use  “name  servers”  to  avoid  confusion. 


and  (o’o, be  an  address  of  the  destination.  Then,  the  sequence  {st-i,...,So,do,. 
meets  our  requirements.^®  In  fact,  going  up  all  the  way  in  the  hierarchy  to  top  level  viewservers 
may  not  be  necessary.  We  can  stop  going  up  at  a  viewserver  s,-  if  there  is  a  viewserver  dj,j  <  I 
such  that  the  domain  of  dj  is  in  the  view  of  s,-  (one  special  case  is  where  S{  =  dj). 

The  view-query  protocol  uses  two  message  types: 

•  (RequestView,  sMddress,  djaddress) 

where  s.address  and  djaddress  are  the  addresses  for  the  source  and  the  destination  respec¬ 
tively.  A  RequestView  message  is  sent  by  a  source  to  obtain  an  accumulated  view  containing 
both  the  source  and  the  destination  domains.  When  a  viewserver  receives  a  RequestView 
message,  it  either  sends  back  its  view  or  forwards  this  request  to  another  viewserver. 

•  (ReplyView,  sjaddress,  djaddress,  accumview') 

where  suiddress  and  djaddress  are  as  above  and  dccuniview  is  the  accumulated  view.  A 
ReplyView  message  is  sent  by  a  viewserver  to  the  source  or  to  another  viewserver  closer  to 
the  source.  The  accumview  field  in  a  ReplyView  message  equals  the  union  of  the  views  of 
the  viewservers  the  message  has  visited. 

We  now  describe  the  events  of  a  source  node  (see  Figure  2).  The  source  node^’'  sends  a 
RequestView  packet  containing  a  source  and  a  destination  address  to  its  parent  in  the  source  ad¬ 
dress  (using  a  fixed  domain-level  source  route).  When  the  source  receives  a  ReplyView  packet,  it 
chooses  a  valid  path  using  the  accumview  in  the  packet.  If.it  does  not  find  a  valid  path,  it  can 
try  again  using  a  different  source  and/or  destination  address.  Note  that,  the  source  does  not  have 
to  throw  away  the  previous  accumulated  views,  but  merge  all  accumulated  views  into  a  richer  ac¬ 
cumulated  view.  In  fact,  it  is  easy  to  change  the  protocol  so  that  source  raTt  also  obtain  views  of 
individual  viewservers  to  make  the  accumulated  view  even  richer. 

The  events  of  a  viewserver  x  are  specified  in  Figure  3.  Upon  receiving  a  RequestView  packet, 
X  checks  if  the  destination  domain  is  in  its  precinct^®.  If  it  is,  x  sends  back  its  view  in  a  ReplyView 
packet'.  If  it  is  not,  x  forwards  the  request  packet  to  another  viewserver  as  follows:  x  checks  if  the 
domain  of  any  viewserver  in  the  destination  address  is  in  its  view  or  not.  If  there  is  such  a  domain, 

^  similiai  to  ma-tciung  route  fragmeiiis[7).  However,  in  onr  case  the  sequence  is  computed  in  a  distributed 
fasluon  (these  is  needed  to  handle  topology  changes). 

or  the  policy  server  in  the  source’s  domain 

Even  though  destination  can  be  in  the  view  of  x,  its  polides  and  ToS’s  axe  not  in  the  view  if  it  is  not  in  the 
prednct  of  x. 


Constants 


FixedRouieSuiz),  for  every  \’iewserver-id  z  such  that  x  is  a  parent  of  u, 

^  f  {{)}  if  domainid{u)  =  doTnainid(z) 

~  \  {{^i>  •  *  -j  ^n)  •  di  £  Domainlds}.  Set  of  domain-level  routes  to  x  otherwise 

Events 

RequesiView^{sjaddress^  djaddress)  {Executed  when  u  wants  a  valid  domain-level  source  route} 

Let  s^address  be  (so, . . . and  dlsr  £  FizedRouteSu{s:-i)] 

Send(RequestViey,  suaddresSy  djaddress)  to  using  dlsr 

iiecei vet,  (Reply Viey,  sjaddresSy  djaddrcsSy  accumview) 

Choose  a  valid  domain-level  source  route  using  accumvieiz;; 

If  a  valid  route  is  not  found 

Execute  RequestView^  again  with  another  source  address  and/or  destination  address 
Figure  2:  View-query  protocol:  Events  and  state  of  a  source  u. 

Constants 

Prednctx^  Precinct  of  z. 

SVieWx.  Static  view  of  z. 

Events 

J?eccivex(RequestViey,  s^address,  djaddress) 

Let  djaddress  be  (do,  • » • ,  dt); 
if  do7nainid{dt)  ^  Precinctx  then 

/artvarda:(RequestViey,  s^addresSy  djaddresSy  {}); 
else  /oru;ard5:(ReplyViey,  djaddresSy  s^addressy  SViewx);  {addresses  are  switched} 

endif 

iieceivcx  (Reply Viey,  sjaddresSy  djaddress  y  view) 

/oru?arda;(ReplyViey,  sjaddresSy  djaddresSy  view  \J  SV iewz) 

where  procedure  forwardx{typ^i  sjaddress,  djaddresSy  view) 

Let  sjaddress  be  (so,  •  •  • ,  ^t),  djaddress  be  (do, . . . ,  d?); 
if  3:  :  doTnai7iid{di)  in  SView^  then 

Let  i  =  in3x{j  :  domainid{dj)  in  SView^}] 
target  :=  d,*; 

else  target  ~  Si  such  that  Si^i  =  totalid^z); 
endif; 

dlsr  :=  choose  a  route  to  domainid(targei)  from  domai7iid{x)  using  SView^; 
if  type  =  Request  Vi  ey  then 

Send(RequestViey,  sjaddress,  djaddress)  to  target  using  dlsr; 
else  Send(ReplyViey,  sjaddress,  djaddress,  view)  to  target  using  dlsr; 
endif 


Figure  3:  View-query  protocol:  Events  and  state  of  a  viewserver  x. 


X  sends  the  Request Viey  packet  to  the  last  such  one.  Otherwise  a  is  a  viewserver  in  the  source 


address  and  sends  the  packet  to  its  parent  in  the  source  address.  (Note  that  if  a:  is  a  viewserver  in 
the  destination  address,  its  child. in  the  destination  address  is  definitely  in  its  view.) 

When  a  viewserver  x  receives  a  Reply  View  packet,  it  merges  its  view  to  the  accumulated  view 
in  the  packet.  Then  it  sends  the  ReplyView  packet  towards  the  source  node  same  way  it  would 
send  a  RequestView  packet  towards  the  destination  node  (i.e.  the  role  of  the  source  address  and 
the  destination  address  axe  changed). 

Above  we  have  described  one  possible  way  of  obtaining  the  accumulated  views.  There  are 
various  other  possibilities,  for  example:  (1)  restricting  the  ReplyView  packet  to  take  the  reverse 
of  the  path  that  the  RequestView  packet  took;  (2)  having  ReplyView  packets  go  all  the  way 
up  in  the  viewserver-hierarchy  for  a  richer  accumulated  view;  (3)  source  polling  the  viewservers 
directly  instead  of  viewservers  forwarding  request/reply  messages  to  each  other;  (4)  not  including 
the  non-transit  stub  domains  other  than  the  source  and  the  destination  domains  in  the  accumview; 
(5)  including  some  source  policy  constraints  and  ToS  requirements  in  the  RequestView  packet, 
and  having  the  viewservers  filter  out  some  domains. 

4  Update  Protocol  for  Dynamic  Network  Conditions 

In  this  section,  we  first  examine  how  topology  changes  such  as  link/node  failures,  repairs,  and  cost 
changes,  map  into  domain-level  topolog}'  changes.  Second,  we  describe  how*  domain-level  topology 
changes  are  detected  and  communicated  to  viewservers,  i.e.  view-update  protocol.  Third,  we  modify 
the  view-query  protocol  appropriately. 

Mapping  Topology  Changes  to  Domain-Level  Topology  Changes 

Costs  are  associated  with  domain-level  edges.  The  cost  of  the  domain-level  edge  (>1,J5)  equals  a 

vector  of  values  if  the  link  is  up;  each  cost  value  indicates  how  expensive  it  is  to  cross  domain  A 

to  reach  domcdn  £  according  to  some  criteria  such  as  delaj^,  throughput,  reliability,  etc.  The  cost 

equals  oo  if  all  links  from  A  to  B  are  down^®.  Each  cost  value  of  a  domain-level  edge  (A,  jB)  can 

be  derived  from  the  cost  values  of  the  intra-domain  routes  in  A  and  links  from  Ato  B  [4]^^. 

Note  that  if  a  gateway  counectiiig  .A  to  jB  is  down,  its  liT^k<;  are  also  considered  to  be  down. 

For  example,  the  delay  of  a  domain-level  edge  {A^B)  can  be  calculated  as  the  maximum/average  delay  of  the 
routes  from  any  gateway  in  >1  to  hrst  gateway  in  B. 


Link  cost  changes  and  link/node  failujes  and  repairs  correspond  to  cost  changes,  failures  and 
repairs  of  domain-level  edges.  Link/node  failures  can  also  partition  a  domain  into  cells[15].  A  cell 
is  a  maximal  subset  of  nodes  of  a  domain  that  can  reach  each  other  without  leaving  the  domain. 
With  partitioning,  some  nodes  as  well  as  some  neighbor  domains  may  not  be  accessible  by  all 
cells.  In  the  same  way,  link/node  repairs  may  merge  cells  into  bigger  cells.  We  identify  a  cell 
with  the  minimum  node-id  of  the  gateways  in  the  cell.  In  this  paper,  for  uniformity  we  treat 
an  unpaititioned  domain  as  a  domain  with  one  cell;  we  do  not  consider  cells  that  do  not  isolate 
gateways  since  such  cells  do  not  affect  inter-domain  routes. 

If  a  domain  gets  partitioned,  its  vertex  in  the  domain-level  views  should  be  split  into  as  many 
pieces  as  there  are  cells.  And  when  the  cells  merge,  the  corresponding  vertices  should  be  merged 
as  well. 

Since  a  domain  can  be  partitioned  into  many  cells,  domain-level  source  routes  now  include  cell- 
ids  as  well.  Hence,  the  intra-domain  routing  protocol  of  a  domain  should  include  a  route  to  each 
reachable  neighbor  domain  cell.^^ 

View-Update  Protocol:  Updating  Domain-Level  Views 

Viewservers  do  not  communicate  with  each  other  to  maintain  their  views.  Gateways  detect  and 
communicate  domain-level  topology  changes  to  viewservers.  Each  gateway  periodically  (and  op¬ 
tionally  after  a  change  in  the  intra-domain  routing  table)  inspects  its  intra- domain  routing  table 
and  determines  the  cell  it  belongs.  For  each  cell,  only  the  gateway  whose  node-id  is  the  cell-id 
(i.e.  the  gateway  with  the  TniniTmrm  node-id)  is  responsible  for  communicating  domain-level  topol¬ 
ogy  changes.  We  refer  to  this  gateway  as  the  reporting  gateway.  Reporting  gateways  compute 
the  domain-level  edge  costs  for  each  neighbor  domain  cell,  and  report  them  to  parent  viewservers. 
They  are  also  responsible  for  informing  the  viewservers  of  the  creation  and  deletion  of  cells. 

The  communication  between  a  reporting  gateway  and  viewservers  is  done  by  flooding  over  a 
set  of  domains.  This  set  is  referred  to  as  the  flood  area^.  The  topology  of  a  flood  area  must 
Oui  cells  aie  like  the  domain  components  of  IDPR[l3]. 

**  This  involves  the  following  changes  in  the  intta-domain  routing  protocol:  (1)  Whenever  the  cell-id  of  a  gateway 
changes,  it  reports  its  new  ceU-id  to  adjacent  gateways  in  neighbor  domains.  When  they  receive  this  information, 
they  update  their  intra-domain  routes  to  include  the  new  ceU-id.  (2)  Usually  when  a  node  recovers  from  a  failure, 
ii  queries  its  neighbors  in  its  domain  for  their  intra-domain  routes.  When  a  gateway  recovers,  it  should  also  query 
adjacent  gateways  in  naghbor  domains  for  their  cell-ids. 

For  efficiency,  the  flood  area  can  be  implemented  by  a  radius  and  some  forwarding  limits  (e.g.  do  not  flood 


be  a  connected  graph.  Due  to  the  nature  of  flooding,  a  viewserver  can  receive  information  out  of 
order  for  a  domain  cell.  In  order  to  avoid  old  information  replacing  new  information,  each  gateway 
includes  successively  increasing  time  stamps  in  the  messages  it  sends. 

Due  to  node  and  link  failures,  communication  between  a  reporting  gateway  and  a  viewserver 
can  fail,  resulting  in  the  viewserver  having  out-of-date  information.  To  eliminate  such  information, 
a  viewserver  deletes  any  information  about  a  domain  cell  if  it  is  older  than  a  tirhe-to-die  period.  We 
assume  that  gateways  send  messages  more  often  than  the  time-to-die  value  (to  avoid  false  removal). 

When  a  viewserver  learns  of  a  new  domain  cell,  it  adds  it  to  its  view.  To  avoid  adding  a  domain 
cell  which  was  just  deleted^'^,  when  a  viewserver  receives  a  delete  domain  cell  request,  it  only  marks 
the  domain  cell  as  deleted  (and  removes  the  entry  after  the  time-to-die  period). 

The  view-update  protocol  uses  two  types  of  messages  as  follows: 

♦  (UpdateCell,  domainid^  cellid^  timestamp ^  floodarea^  ncostsei) 

is  sent  by  the  reporting  gateway  to  inform  the  viewservers  about  current  domain-level  edge 
costs  of  its  cell.  Here,  domainid^  cellid^  and  timestamp  indicate  the  domain,  the  cell  and  the 
time  stamp  of  the  reporting  gateway,  ncostsei  contains  a  cost  for  each  neighbor  domain  cell, 
and  floodarea  is  the  set  of  domains  that  this  message  is  to  be  sent  over. 

•  (DeleteCell,  domainid^  cellid^  timestamp^  floodarea) 

where  the  parameters  are  as  in  the  UpdateCell  message.  It  is  sent  by  a  reporting  gateway 
when  it  becomes  non-reporting  (because  its  cell  expanded  to  include  a  gateway  with  lower 
id),  to  inform  viewservers  to  delete  the  gateway’s  old  cell. 

The  state  maintained  by  a  gateway  g  is  listed  in  Figure  4.  Note  that  LocalView server s^  and 
LocalGatewaysg  can  be  empty.  IntraDomainRTg  contains  a  route  (next-hop  or  source)  for  every 
reachable  node  of  the  domain  and  for  every  reachable  neighbor  domain  cell^^.  We  assume  that 
consecutive  reads  of  Clockg  returns  increasing  values. 

The  state  maintained  by  a  viewserver  z  is  listed  in  Figure  5.  DView^:  is  the  dynamic  part  of 
r’s  view.  For  each  domain  cell^®  known  to  r,  DViewx  stores  a  timestamp  field  which  equals  the 
beyond  backbones)  instead  of  a  set. 

If  tbe  domain  cell  was  removed,  the  timestamp  for  that  domain  cell  is  also  lost. 

IniTciDoTnainKl^g  is  a  view  in  case  of  a  link-state  routing  protocol  or  a  distance  table  in  case  of  a  distance-vector 
routing  protocol. 

We  use  A:g  to  denote  the  cell  g  of  domain  A. 


Constants: 

LocalViewstrverSg .  (C  Totallds).  Set  of  viewservers  in  g's  domain. 
LocalGaitwaySg.  (C  Totallds).  Set  of  gateways  in  p’s  domain  excluding  p, 
AdjForeignGaiewaySg.  (C  Totallds).  Set  of  adjacent  gateways  in  other  domains. 
FloodAreag.  (C  Domainlds).  The  flood  area  of  the  domain  (includes  domain  of  p). 
Variables: 

IniraDomainRTg .  Intra-domain  routing  table  of  p.  Initially  contains  no  entries. 
Cellldg  :  Hodelds.  The  id  of  p’s  cell.  Initially  =  oo 
Clockg  :  Integer.  Clock  of  p. 

Figure  4:  State  of  a  gateway  p. 


Constants: 

Precincix.  Precinct  of  z. 

SVieWx-  Static  view  of  z. 

TimcToDiCs:  :  Integer.  Time-to-die  value. 

Variables: 

DView:c^  Dynamic  view  of  z. 

=  {(A:g,  timestamp^  ezpirytime,  deleted^ 

cost)  :  B  €  DomainN tighhors{A)  A  h  £  Hodelds  U  {*}  })  : 

A  £  Prccincix  A  g  £  Hodelds} 

Clocks  :  Integer.  Clock  of  x. 

Figure  5:  State  of  a  viewserver  x. 

largest  timestamp  received  for  this  domain  cell,  an  expirytime  field  which  equals  the  end  of  the 
time-to-die  period  for  this  domain  cell,  a  deleted  field  which  marks  whether  or  not  the  domain  cell 
is  deleted,  and  a  cost  set  which  indicates  a  cost  for  every  neighbor  domain  cell  whose  domain  is  in 
SViews^  The  cell-id  of  a  neighbor  domain  equals  if  no  cell  of  the  neighbor  domain  is  reachable. 
The  events  of  gateway  p  and  a  viewserver  x  are  specified  in  Appendix  A. 

Changes  to  View-Query  Protocol 

We  now  enumerate  the  changes  needed  to  adapt  the  view-query  protocol  to  the  dynamic  case  (the 
formal  specification  is  omitted  for  space  reasons). 

Due  to  link  and  node  failures,  RequestViey  and  ReplyVieu  packets  can  get  lost.  Hence,  the 


source  may  never  receive  a  ReplyVie-s  packet  after  it  initiates  a  request.  Thus,  the  source  should 
try  again  after  a  time-out  period. 

When  a  viewserver  receives  a  RequestView  message,  in  the  static  case  it  replies  with  its  view 
if  the  destination  domain  is  in  its  precinct.  Now,  because  domain-level  edges  can  fail,  it  must  also 
check  its  d>mamic  view  and  reply  with  its  views  only  if  its  dynamic  view  contains  a  path  to  the 
destination.  Similarly  during  forwarding  of  Request  Vi  gw  and  Reply  View  packets,  a  viewserver, 
while  checking  whether  a  domain  is  in  its  view,  should  also  check  if  its  dynamic  view  contains  a 
path  to  it. 

Finally,  when  a  viewserver  sends  a  message  to  a  node  whose  domain  is  partitioned,  it  should 
send  a  copy  of  the  message  to  each  cell  of  the  domain.  This  is  because  a  viewserver  does  not  know 
which  cell  contains  the  node. 

5  Evaluation 

Many  inter- domain  routing  protocols  have  been  proposed,  based  on  various  kinds  of  hierarchies. 
How  do  these  protocols  compare  against  each  other  and  against  the  simple  approach?  To  answer  this 
question,  we  need  a  model  in  which  we  can  define  internetwork  topologies,  policy /ToS  constraints, 
inter-domain  routing  hierarchies,  and  evaluation  measures  (e.g.  memory  and  time  requirements) 
for  inter-domain  routing  protocols. 

In  this  section,  we  first  present  such  a  model,  and  then  use  the  model  to  evaluate  our  viewserver 
hierarchy  and  compare  it  to  the  simple  approach.  Our  evaluation  measures  are  the  amount  of 
memory  required  at  the  source  and  at  the  routers,  the  amount  of  time  needed  to  construct  a  path, 
and  the  number  of  paths  found  out  of  the  total  number  of  valid  paths. 

Even  though  the  model  described  here  can  be  applied  to  other  inter-domain  routing  protocols, 
we  have  not  done  so,  and  hence  have  not  compared  them  against  our  viewserver  hierarchy.  This 
is  because  of  lack  of  time,  and  because  precise  definitions  of  the  hierarchies  in  these  protocols  is 
not  available.  For  example,  to  do  a  fair  evaluation  of  IDPR[13],  we  need  precise  guidelines  for 
how  to  group  domains  into  super-domains,  and  how  to  choose  between  the  union  and  intersection 
methods  when  defining  policy /ToS  constraints  of  super-domains.  In  fact,  these  protocols  have  not 
been  evaluated  in  a  way  that  we  can  compare  them  to  the  viewserver  hierarchy.  To  the  best  of  oux 
knowledge,  this  paper  is  the  first  to  evaluate  a  hierarchical  inter-domain  routing  protocol  against 


explicitly  stated  policy  constraints. 


5.1  Evaluation  Model 

We  first  describe  our  metbod  of  generating  topologies  and  policy/ToS  constraints.  We  then  describe 
the  evaluation  measures. 

Generating  Internetwork  Topologies 

For  our  purposes,  an  internetwork  topology  is  a  directed  graph  where  the  nodes  correspond  to 
domains  and  the  edges  correspond  to  domain-level  connections.  However,  an  arbitrary  graph  will 
not  do.  The  topology  should  have  the  characteristics  of  a  real  internetwork,  like  the  Internet.  That 
is,  it  should  have  backbones,  regionals,  MANS,  LANS,  etc.;  these  should  be  connected  hierarchically 
(e.g.  regionals  to  backbones),  but  “non-hierarchical”  connections  (e.g.  “back-doors”)  should  also 
be  present. 

For  brevity,  we  refer  to  backbones  as  class  0  domains,  regionals  as  class  1  domains,  metropolitan- 
area  domains  and  providers  as  class  2  domains,  and  campus  and  local- area  domains  as  class  3 
domains.  A  (strictly)  hierarchical  interconnection  of  domains  means  that  class  0  domains  are 
connected  to  each  other,  and  for  i  >  0,  class  i  domains  are  connected  to  class  i  —  1  domains. 
As  mentioned  above,  we  also  want  some  “non-hierarchical”  connections,  i.e.,  domain-level  edges 
between  domains  irrespective  of  their  classes  (e.g.  from  a  campus  domain  to  another  campus 
domain  or  to  a  backbone  domain). 

In  reality,  domains  span  geographical  regions  and  domain-level  edges  are  usually  between  do¬ 
mains  that  are  geographically  close  (e.g.  University  of  Maryland  campus  domain  is  connected  to 
SURANET  regional  domain  which  is  in  the  east  cost).  A  class  i  domain  usually  spans  a  larger 
geographical  region  than  a  class  i  -f  1  domain.  To  generate  such  interconnections,  we  associate  a 
“region”  attribute  to  each  domain.  The  intention  is  that  two  domains  with  the  same  region  are 
geographically  close. 

The  region  of  a  class  i  domain  has  the  form  ro.ri.  •  • -.ri,  where  the  r^’s  are  integers.  For 
example,  the  region  of  a  class  3  domain  can  be  1.2. 3. 4.  For  brevity,  we  refer  to  the  region  of  a 
class  i  domain  as  a  class  i  region. 

Note  that  re^ons  have  their  own  hierarchy.  Class  0  regions  are  the  top  level  regions.  We  say 


that  a  class  i  region  ro.xi.  •  •  *.ri  is  contained  in  the  class  i  —  1  region  xo-rj.  •  •  •  .ri_i  (where  i  >  0). 
Containment  is  transitive.  Thus  region  1.2. 3. 4  is  contained  in  regions  1.2.3,  1.2  and  1. 


Figure  6:  Regions 

Given  any  pair  of  domains,  we  classify  them  as  local,  remote  or  far,  based  on  their  regions. 
Let  X  be  a  class  i  domain  and  Y  a  class  j  domain,  and  (without  loss  of  generality)  let  i  ^  j  • 
X  and  y  are  local  if  they  axe  ia  the  same  class  i  region.  For  example  in  Figure  6,  A  is  local  to 
J, RT,  , TV, 0,  and  Q,  X  and  Y  are  rernote  if  they  are  not  in  the  same  class  i  region  but 
they  are  in  the  same  class  i  —  1  region,  or  if  i  =  0.  For  example  in  Figure  6,  some  of  the  domains 
A  is  remote  to  are  D^E^F ,  and  L,  X  and  Y  are  far  if  they  axe  not  local  or  remote.  For  example 
in  Figure  6,  A  is  far  to  J. 

We  refer  to  a  domain -level  edge  as  local  (remote,  or  /or)  if  the  two  domains  it  connects  are  local 


(remote,  or  far). 

We  Tise  the  foilowing  procedure  to  generate  internetwork  topologies; 

•  We  first  spediy  the  number  of  domain  classes,  and  the  number  of  domains  in  each  class. 

•  We  next  specify  the  regions.  Note  that  the  number  of  region  classes  equals  the  number  of 
domain  classes.  We  specify  the  number  of  class  0  regions.  For  each  class  i  >  0,  we  specify  a 
branching  factor,  which  creates  that  many  class  i  regions  in  each  class  i  —  1  region.  (That  is, 
if  there  are  two  class  0  regions  and  the  class  1  branching  factor  equals  three,  then  there  are 
six  class  1  regions.) 

•  For  each  class  i,  we  randomly  map  the  class  i  domains  into  the  class  i  regions.  Note  that 
several  domains  can  be  mapped  to  the  same  region,  and  some  regions  may  have  no  domain 
mapped  into  them. 

•  For  every  class  i  cind  every  class  j,  j  >  i,  we  specify  the  number  of  local,  remote  and  far 
edges  to  be  introduced  between  class  :  domains  and  class  j  domains.  The  end  points  of  the 
edges  are  chosen  randomly  (within  the  spedfied  constraints). 

We  ensure  that  the  internetwork  topology  is  connected  by  ensuring  that  the  subgraph  of  dass 
0  domains  is  connected,  and  each  dass  i  domain,  for  i  >  0,  is  connected  to  a  local  dass  i  —  1 
domain. 

Choosing  Policy/ToS  Constraints 

We  chose  a  simple  scheme  to  model  Policy /ToS  constraints.  Each  domain  is  assigned  a  color:  green 
or  red.  For  each  domain  dass,  we  specify  the  percentage  of  green  domains  in  that  dass,  and  then 
randomly  choose  a  color  for  each  domain  in  that  dass. 

A  valid  route  from  a  source  to  a  destination  is  one  that  does  not  visit  any  red  intermediate  do¬ 
mains;  the  source  and  destination  are  allowed  to  be  red.  Notice  that  this  models  transit  policy /ToS 
constraints.  We  are  working  on  extending  this  modd  to  source  policy /ToS  constraints. 

Computing  Evaluation  Measures 

The  evaluation  measures  of  most  interest  for  an  inter- domain  routing  protocol  are  its  memory  and 
time  requirements,  and  the  number  of  valid  paths  it  finds  (and  their  lengths)  in  comparison  to 
the  number  of  available  valid  paths  (and  their  lengths)  in  the  internetwork  (e.g.  could  it  find  the 


shortest  valid  path  in  the  internetwork). 

The  only  analysis  method  we  have  at  present  is  to  numerically  compute  the  evaluation  measures 
for  a  variety  of  source-destination  pairs.  Because  we  use  internetwork  topologies  of  large  sizes,  it  is 
not  feasible  to  compute  for  all  possible  source-destination  pairs.  We  randomly  choose  a  set  of  source- 
destination  pairs  that  satisfy  the  following  conditions:  (1)  the  source  and  destination  domains  are 
different,  and  (2)  there  exists  a  valid  path  from  the  source  domain  to  the  destination  domain  in 
the  internetwork  topology.  (Note  that  the  simple  scheme  would  always  find  such  a  path.) 

For  a  source-destination  pair,  we  refer  to  the  length  of  the  shortest  valid  path  in  the  internetwork 
topology  as  the  shortest-path  length.  Since  the  number  of  paths  between  a  source-destination  pair 
is  potentially  very  large  (factorial  in  the  number  of  domains),  and  we  are  not  interested  in  the 
paths  that  are  too  long,  we  only  count  the  number  of  paths  whose  lengths  are  not  more  than  the 
shortest-path-length  plus  2. 

The  evaluation  measures  described  above  are  protocol  independent.  However,  there  are  also 
important  evaluation  measures  that  are  protocol  dependent  (e.g.  number  of  levels  traversed  in 
some  particular  hierarchy).  Because  of  this  we  postpone  the  precise  definitions  of  the  evaluation\ 
measures  to  the  next  subsection  (their  definition  is  dependent  of  viewserver  hierarchy). 

5.2  Application  to  Viewserver  Protocol 

We  have  used  the  above  model  to  evaluate  our  viewserver  protocol  for  several  different  viewserver 
hierarchies  and  query  methods.  We  first  describe  the  different  viewserver  schemes  evaluated.  Please 
refer  to  Figure  6  in  the  following  discussion. 

The  first  viewserver  scheme  is  referred  to  as  base.  It  has  exactly  one  viewserver  in  each  domain. 
Each  viewserver  is  identified  by  its  domain-id.  The  domains  in  a  viewserver’s  precinct  consist  of 
its  domain  and  the  neighboring  domains.  The  edges  in  the  viewserver’s  view  consist  of  the  edges 
between  the  domains  in  the  precinct,  and  edges  outgoing  from  domains  in  the  precinct  to  domains 
not  in  the  prednct.  For  example,  the  prednct  of  viewserver  A  (i.e.  the  viewserver  in  domain  A) 
consists  of  domains  A,B,  J;  the  edges  in  the  view  of  viewserver  A  consists  of  domain-level  edges 
(A,B),(A,/),(B,  J),(J,M),(J,ii:),(J,J’),and  (J,D). 

As  for  the  viewserver  hierarchy,  a  ^’iewse^ve^’s  level  is  defined  to  be  the  class  of  its  domain.  That 
is,  a  '\dewserver  in  a  class  i  domain  is  a  level  i  viewserver.  For  each  level  i  viewserver,  i  >  0,  its 


parent  viewserver  is  chosen  randomly  from  the  level  i  “  1  viewservers  in  the  parent  region  such  that 
there  is  a  domaindevel  edge  between  the  viewserver’s  domain  and  the  parent  viewserver’s  domain. 
For  example,  for  viewserver  C,  we  can  pick  viewserver  J  oi  K\  suppose  we  pick  J.  For  viewserver 
J,  we  have  no  choice  but  to  pick  M  {N  and  0  are  not  connected  to  J).  For  Jlf,  we  pick  P  (out  of 
P  and  Q). 

We  use  only  one  address  for  each  domain.  The  viewserver-address  of  a  stub  domain  is  con¬ 
catenation  of  four  viewserver  (i.e.  domain)  ids.  Thus,  the  address  of  A  is  P.Jkf.  J.A.  Similarly,  the 
address  of  H  is  P.M.K.S .  To  obtain  a  route  between  A  and  JJ,  it  suffices  to  obtain  views  of 
viewservers  A, 

The  second  viewserver  scheme  is  referred  to  as  base-QT  (where  the  (JT  stands  for  “query  upto 
top’’).  It  is  identical  to  base  except  that  during  the  query  protocol  all  the  viewservers  in  the  source 
and  the  destination  addresses  axe  queried.  That  is,  to  obtain  a  route  between  A  and  jff ,  the  views 
of  A^J^M^P^K are  obtained. 

The  third  viewserver  scheme  is  referred  to  as  locals.  It  is  identical  to  base  except  that  now  a  • 
viewserver’s  precinct  also  contains  domains  that  have  the  same  region  as  the  viewserver’s  domain. 
That  is,  the  precinct  of  viewserver  A  has  the  domains  J,  C.  Note  that  in  this  scheme  a 
viewserver’s  view  is  not  necessarily  connected.  For  example,  if  the  edge  (C,  J)  is  removed,  the  view 
of  viewserver  A  is  no  longer  connected.  (In  Section  3,  we  said  that  the  view  of  a  viewserver  should 
be  connected.  Here  we  have  relaxed  this  condition  to  simplify  testing.) 

The  fourth  viewserver  scheme  is  referred  to  as  locais-QT.  It  is  identical  to  locals  except  that 
during  the  query  protocol  all  the  viewservers  in  the  source  and  the  destination  addresses  axe  queried. 

The  fifth  viewserver  scheme  is  referred  to  as  vertex-extension.  It  is  identical  to  hose  except 
that  viewserver  precincts  are  extended  as  follows:  Let  P  denote  the  precinct  of  a  viewserver  in  the 
base  scheme.  For  each  domain  X  in  P,  if  there  is  an  edge  from  domain  X  to  domain  Y  and  Y 
is  not  in  P,  domain  Y  is  added  to  the  precinct;  among  y’s  edges,  only  the  ones  to  domains  in  P 
are  added  to  the  view.  In  the  example,  domains  M^K^F^D  are  added  to  the  precinct  of  A,  but 
outgoing  edges  of  these  domains  to  other  domains  are  not  included  (e.g.  {F^G)  is  not  included). 
The  advantage  of  this  scheme  is  that  even  though  it  increaises  the  precinct  size  by  a  factor  which 
is  potentially  greater  than  2,  it  increases  the  number  of  edges  stored  in  the  view  by  a  factor  less 
than  2.  (In  fact,  if  the  same  edge  cost  and  edge  policies  are  used  for  both  directions  of  domain- 


level  edges,  then  the  only  other  information  that  needs  to  be  stored  by  the  viewservers  is  the  policy 
constraints  of  the  newly  added  domains.) 

The  sixth  viewserver  scheme  is  referred  to  as  fuU-QT.  It  is  constructed  in  the  same  way  as 
vertex-extension  except  that  the  locals  scheme  is  used  instead  of  base  scheme  to  define  the  P  in 
the  construction.  In  fuU-QT,  during  the  query  protocol  all  the  viewservers  in  the  source  and  the 
destination  addresses  are  queried. 

In  all  the  above  viewserver  schemes,  we  have  used  the  same  hierarchy  for  both  domain  classes 
and  viewservers.  In  practice,  not  all  domains  need  to  have  a  viewserver,  and  a  viewserver  hierarchy 
different  from  the  domain  class  hierarchy  can  be  deployed.  However,  there  is  an  advantage  of 
having  a  viewserver  in  each  domain;  that  is,  source  nodes  do  not  require  fixed  domain-level  source 
routes  to  their  parent  viewservers  (in  the  view-query  protocol).  This  reduces  the  amount  of  hand 
configuration  required.  In  fact,  the  base  scheme  does  not  require  any  hand  configuration,  viewservers 
can  decide  their  precincts  from  the  intra-domain  routing  tables,  and  nodes  can  use  intra-domain 
routes  to  reach  parent  viewservers. 

Results  for  Internetwork  1 

The  parameters  of  the  first  internetwork  topology,  referred  to  as  Internetwork  1,  are  shown  in 
Table  1. 

Our  evaluation  measures  were  computed  for  a  (randomly  chosen  but  fixed)  set  of  1000  source- 
destination  pairs.  For  brevity,  we  use  spl  to  refer  to  the  shortest-path  length  (i.e.  the  length  of 
the  shortest  valid  path  in  the  internetwork  topology).  The  Tni-ni-rrmTn  spl  of  these  pairs  was  2,  the 
maximum  spl  was  13,  and  the  average  spl  was  6.8.  Table  2  lists  for  each  viewserver  scheme  (1)  the 
minimum,  average  and  maximum  precinct  sizes,  (2)  the  TniniTmim,  average  and  maximum  merged 
view  sizes,  and  (3)  the  minimum,  average  and  maximum  number  of  viewservers  queried. 

The  precinct  size  indicates  the  memory  requirement  at  a  viewserver.  More  precisely,  the  memory 
requirement  at  a  viewserver  is  0(prednct  size  x  d)  where  d  is  the  average  number  of  neighbor 
domains  of  a  domain,  except  for  the  vertex-extension  and  full-QT  schemes.  In  these  schemes,  the 
memory  requirement  is  increased  by  a  factor  less  than  two.  Hence  the  vertex-extension  scheme  has 
the  same  order  of  viewserver  memory  requirement  as  the  base  scheme  and  the  full-QT  scheme  has 

^^Brandung  factor  is  4  for  all  region  classes. 


Class  i 


No.  of  Domains 


No.  of  Regions^^ 


%  of  Green  Domains 


Scheme 

Precinct  Size 

Merged  View  Size 

No.  of  Viewservers  Queried 

base 

2  /  3.2  /  68 

7  /  71.03  /  101 

3  /  7.51  /  8 

base-QT 

2  /  3.2  /  68 

30  /  76.01  /  101 

8  /  8.00  /  8 

locals 

2  /  52.0  /  103 

3  /  95.40  /  143 

2  /  7.42  /  8 

locals- QT 

2  /  52.0  /  103 

43  /  101.86  /  143 

8  /  8.00  /  8 

vertex-extension 

3  /  19.2  /  796 

23  /  362.15  /  486 

3  /  7.51  /  8 

full-QT 

11  /  102.9  /  796 

228  /  396.80  /  519 

8  /  8.00  /  8 

Table  2:  Precinct  sizes,  merged  view  sizes,  and  number  of  viewservers  queried  for  Internetwork  1. 


the  same  order  of  viewserver  memory  requirement  as  the  locals  scheme. 

The  merged  view  size  indicates  the  memory  requirement  at  a  source;  i.e.  the  memory  require¬ 
ment  at  a  source  is  0 (merged  view  size  x  d)  except  for  the  vertex-extension  and  full-QT  schemes. 
Note  that  the  source  does  not  need  to  store  information  about  red  and  non- transit  domains.  The 
numbers  in  Table  2  take  advantage  of  this. 

The  number  of  viewservers  queried  indicates  the  communication  time  required  to  obtain  the 
merged  view  at  the  source.  Because  the  average  spl  is  6.8,  the  “real-time”  communication  time 


required  to  obtain  the  merged  view  at  a  source  is  slightly  more  than  one  round-trip  time  between 
the  source  and  the  destination. 

As  is  apparent  from  Table  2,  using  a  QT  scheme  increases  the  merged  view  size  and  the  number 
of  viewservers  queried  only  by  about  5%.  Using  a  locals  scheme  increases  the  merged  view  size 
by  about  30%.  Using  the  vertex-extension  scheme  increases  the  merged  view  size  by  5  times  (note 
that  the  amoimt  of  actual  memory  needed  increases  only  by  a  factor  less  than  2).  The  number  of 
viewservers  queried  in  the  locals  scheme  is  less  than  the  number  of  viewservers  queried  in  the  base 
scheme.  This  is  because  the  viewservers  in  the  locals  scheme  have  bigger  precincts,  and  a  path  from 
the  source  to  the  destination  can  be  found  using  fewer  views. 

Table  3  shows  the  average  number  of  spl,  spl  -f  1,  spl  2  length  paths  found  for  a  source- 
destination  pair  by  the  simple  approach  and  by  the  viewserver  schemes.  All  the  viewserver  schemes 
are  very  close  to  the  simple  approach.  The  vertex-extension  and  full-QT  schemes  are  especially  close 
(they  found  98%  of  all  paths).  Table  3  also  shows  the  number  of  pairs  for  which  the  viewserver 
schemes  did  not  find  a  path  (ranging  from  1.4%  to  5.9%  of  the  source-destination  pairs),  and 
the  number  of  pairs  for  which  the  viewserver  schemes  found  longer  paths.  For  these  pairs,  more 
viewserver  addresses  need  to  be  tried.  Note  that  the  locals  and  vertex-extension  schemes  decrease  the 
number  of  these  pairs  substantially  (adding  QT  yields  further  improvement).  Our  policy  constraints 
are  source  and  destination  domain  independent.  Hence,  even  a  class  2  domain,  if  it  is  red,  can  not 
carry  traffic  to  a  class  3  domain  to  which  it  is  connected.  We  believe  that  these  fi.gures  would 
improve  with  polides  that  are  dependent  on  source  and  destination  domains. 

As  is  apparent  from  Table  3  and  Table  2,  the  locals  scheme  does  not  find  many  more  extra 
paths  than  the  base  scheme  even  though  it  has  larger  precinct  and  merged  view  sizes.  Hence  it  is 
not  reconamended.  The  vertex-extension  scheme  is  the  best,  but  even  base  is  adequate  since  it  finds 
many  paths. 

We  have  repeated  the  above  evaluations  for  two  other  internetworks  and  obtained  similar  con- 
dusions.  The  results  are  in  Appendix  B. 

6  Concluding  Remarks 

We  presented  iderarcliical  inter-domain  routing  protocol  that  (1)  satisfies  policy  and  ToS  con¬ 
straints,  (2)  adapts  to  dynamic  topology  changes  including  failures  that  partition  domains,  and 


Scheme 

Numl 

spl 

)er  of  pat' 

spl  -)- 1 

bis  found 

spl  -1-  2 

No.  of  pairs 

with  no  path 

No.  of  pairs 

with  longer  paths 

simple 

2.51 

18.48 

131.01 

N/A 

N/A 

base 

2.41 

15.84 

99.42 

59 

3  by  1.33  hops 

base-QT 

15.86 

100.16 

54 

3  by  1.33  hops 

iSBI 

29 

3  by  1  hop 

locals- QT 

B 

105.02 

20 

3  by  1  hop 

vertex-extension 

18.38 

128.19 

22 

0  by  0  hops 

full-QT 

128.90 

14 

0  by  0  hops 

Table  3:  Number  of  paths  found  for  Internetwork  1. 


(3)  scales  well  to  large  number  of  domains. 

Our  protocol  uses  partial  domain-level  views  to  achieve  scaling  in  space  requirement.  It  floods 
domain-level  topological  changes  over  a  flood  area  to  achieve  scaling  in  communication  requirement. 

It  does  not  abstract  domains  into  superdomains;  hence  it  does  not  lose  any  domain-level  detail 
in  ToS  and  policy  information.  It  merges  a  sequence  of  partial  views  to  obtain  domain-level  source 
routes  between  nodes  which  are  far  away.  The  number  of  views  that  need  to  be  merged  is  bounded 
by  twice  the  nmnber  of  levels  in  tbe  hieraxcby. 

To  evaluate  and  compare  inter- domaiii  routing  protocols  against  each  other  and  against  sim- 
pie  approach,  we  presented  a  model  in  which  one  can  define  internetwork  topologies,  policy /ToS 
constraints,  inter-domain  routing  hierarchies,  and  evaluation  measures.  We  applied  this  model  to 
evaluate  our  viewserver  hierarchy  and  compared  it  to  the  simple  approach.  Our  results  indicate 
that  viewserver  hierarchy  finds  many  short  valid  paths  and  reduces  the  amount  of  memory  require¬ 
ment  by  two  order  of  magnitude. 

Our  protocol  recovers  from  fail-stop  failures  of  viewservers  and  gateways.  When  a  viewserver 
fails,  an  address  which  includes  the  viewserver’s  id  becomes  useless.  This  deficiency  can  be  overcome 
by  replicating  each  viewserver  at  different  nodes  of  the  domain  (in  this  case  a  viewserver  fails  only 
if  all  nodes  implementing  it  fail).  This  replication  scheme  requires  viewserver  ids  to  be  independent 
of  node  ids,  which  can  be  easily  accomplished^*. 
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For  example,  if  node-ids  of  nodes  implementing  a  viewserver 


share  a  prefix,  this  prefix  can  be  used  as  the 


The  only  drawback  of  our  protocol  is  that  to  obtain  a  domain-level  source  route,  views  are 
merged  at  (or  prior  to)  the  connection  (or  flow)  setup,  thereby  increasing  the  setup  time.  This 
drawback  is  not  unique  to  our  scheme  [8,  13,  6,  10]. 

There  are  several  ways  to  reduce  the  setup  overhead.  First,  domain-level  source  routes  to  fre¬ 
quently  used  destinations  can  be  cached.  The  cacheing  period  would  depend  on  the  ToS  require¬ 
ment  of  the  applications  and  the  frequency  of  domain-level  topology  changes.  For  example,  the 
period  can  be  long  for  electronic  mail  since  it  does  not  require  shortest  paths. 

Second,  views  of  frequently  queried  viewservers  can  be  replicated  at  “mirror”  viewservers  in  the 
source  domain.  A  viewserver  would  periodically  update  the  views  of  its  mirror  viewservers. 

Third,  connection  setup  also  involves  traversing  the  name  server  hierarchy  (to  obtain  destination 
addresses  from  its  names).  By  integrating  the  name  server  hierarchy  with  the  viewserver  hierarchy, 
we  may  be  able  to  do  both  operations  simultaneously.  This  requires  further  investigation. 
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A  View-Update  Protocol  Event  Specifications 

The  events  of  gateway  g  are  specified  in  Figure  7.  When  a  gateway  ^  recovers,  Cellldg  is  set  to 
nodeid{g).  Thus,  when  g  next  executes  UpdatCg^  it  sends  either  an  UpdateCell  or  a  DeleteCell 
message  to  viewservers,  depending  on  whether  it  is  no  longer  the  minimum  id  gateway  in  its  ceH^® . 

The  events  of  a  viewserver  x  axe  specified  in  Figure  5.  Note  that  when  x  adds  an  entry  to 
DVieWx  (upon  receiving  a  UpdaUeCell  message),  it  selectively  chooses  subset  of  neighbors  from 
the  cost  set  in  the  packet  to  include  only  the  neighbor  domains  which  axe  in  SView^-  When 
a  viewserver  x  recovers,  DView^  is  set  to  {}.  Its  view  becomes  up-to-date  as  it  receives  new 
information  from  reporting  gateways  (and  remove  false  information  with  the  time-to-die  period). 

Sending  a  DeleteCell  message  is  essential.  Because  prior  to  the  failure,  g  may  have  been  the  smallest  id 
gateway  in  its  cell.  Hence,  some  viewservex^s  may  still  contain  an  entry  for  its  old  domain  cell. 


UpdaiCg  {Executed  periodically  and  also  optionally  upon  a  change  in  IntTaDomainRTg) 

{Determines  the  id  of  g's  cell  and  initiates  UpdateCell  and  DeleteCell  messases  if  needed  1 
OldCeUId  =  Ce!lIdg; 

Ctllldg  :=  compute  cell  id  using  LocalGatewaySg  and  IntraDomainRTg] 
if  nodeid{g)  =  Cellldg  then 

ncostset  :=  compute  costs  for  each  neighbor  domain  cell  using  IntraDomainRTg; 
//oodj((UpdateCell,  domainid{g),  Cellldg,  Clockg,  Flood Areag,ncostsd.))\ 
endif 

if  nodeid(g)  =  OldCtllld  Cellldg  then 

//ood^ ((DeleteCell,  domainid{g),  nodeid{g),  Clockg,  FloodAreaA): 
endif 

Receive g {packet)  {either  an  UpdateCell  or  a  DeleteCell  packet} 

floodg(packet) 

where  procedure  f I oodg  (packet) 

if  domainid(g)  6  packet. floodarea  then 

{remove  domain  of  g  jhom  the  flood  area  to  avoid  infinite  exchange  of  the  same  message.} 
packet. floodarea  :=  packet./ loodarea  —  {domainid(g)}; 
for  all  h  6  LocalGatewaysg  LocalVievjserverSg  do 
Send(pacict)  to  h  using  (); 
endif 

for  all  h  €  AdjForeignGatewaySg  A  domainid(h)  £  packet. floodarea  do 
Send  (pacict)  to  h\ 

Gateway  Failure  hdodel;  A  gateway  can  undergo  failures  and  recoveries  at  anytime.  We  assume  failures 
are  fail-stop  (Le.  a  failed  gateway  does  not  send  erroneous  messages).  When  a  gateway  g  recovers,  Cellldg 
is  set  to  nodeid(g). 


Figure  7;  ^'iew- update  protocol:  Events  of  a  gateway  0. 


B  Results  for  Other  Internetworks 

Results  for  Internetwork  2 

The  pajameters  of  the  second  internetwork  topology,  referred  to  ais  Internetwork  2,  are  the  same 
as  the  parameters  of  Internetwork  1  (a  different  seed  is  used  for  the  random  number  generation). 

Our  evaluation  measures  were  computed  for -a  set  of  1000  source-destination  pairs.  The  mini¬ 
mum  spl  of  these  pairs  was  2,  the  maximum  $pl  was  13,  and  the  average  $pl  was-  7.2. 

Table  4  and  Table  5  shows  the  results.  Similar  conclusions  to  Internetwork  1  hold  for  Internet¬ 
work  2.  In  Table  5,  the  reason  that  local  and  QT  schemes  have  more  pairs  with  longer  paths  than 
the  base  scheme  is  that  these  schemes  found  some  paths  (which  are  not  shortest)  for  some  pairs  for 
which  the  base  scheme  did  not  find  any  path. 


jFiece2ver(UpdateCell,  did,  cid,  is,  FloodArta,  ncset) 
if  did  G  Precinctc  then 

if  3(dicf:cid,  timestamp,  expirxftime,  deleted,  ncostsei)  E  DView^  A 
is  >  timestamp  then  {received  is  more  recent;  delete  the  old  one} 

delete  {did:cid,  timestamp,  ezpirytime,  deleted,  ncostset)  from  DViewx] 
endif 

if  “i3{did:cid,  timestamp,  expiryiime,  deleted,  ncostset)  EDVieWx  then 
Choose  ncostsei  from  ncset  using  SViewx] 

insert  (did:cid,  is,  Clockx'+TimeToDiexi  lalse,  ncostset)  to  DViewx] 
endif 
endif 

ileceivex(I)eleteCell,  did,  cid,  is,  floodarea) 
if  did  G  Precinctx  then 

]f  3{did:dd,  timestamp,  expirytime,  deleted,  ncostsei)  E  DVieWx  A 
is  >  timestamp  then  {received  is  more  recent;  delete  the  old  one} 

delete  {did.xid,  timestamp,  expirytime,  deleted,  ncostset)  from  DVieWx] 
endif 

if  “>3(did:cid,  timestamp,  expirytime,  deleted,  ncostset)  E  DViewx  then 
insert  (didrcid,  ts,  Clockx-h  TimeToDicx,  tme,{})  to  DViewx] 
endif 
endif 

Deletex  {Executed  periodically  to  delete  entries  older  than  the  time-to-die  period} 

for  all  {A:g,  tstamp,  expirytime,  deleted,  ncset)  G  DViewx  A  expirytime  <  Clockx  do 
delete  {A:g,  tstamp,  expirytime,  deleted,  ncset)  from  DViewx*, 

Viewserver  Failure  Model:  A  viewserver  can  imdergo  failures  and  recoveries  at  anytime.  We  assume 
failures  are  fail-stop,  ^^^en  a  viewserver  x  recovers,  DViewx  is  set  to  {}. 

Figure  8:  View  update  events  of  a  viewserver  x. 


Table  4:  Precinct  sizes,  merged  view  sizes,  and  no  of  viewservers  queried  for  Internetwork  2. 
Results  for  Internetwork  3 

The  parameters  of  the  third  internetwork  topology,  referred  to  as  Internetwork  3,  are  shown  in 
Table  6.  Internetwork  3  is  more  connected,  more  class  0,  1  and  2  domains  are  green,  and  more 


Precmcl  Si2e 

Merged  View  Size 

No.  of  Viewservers  Queried 

2  /  3.2  /  76 

.  4  /  66.62  /  96 

3  /  7.55  /  8 

2  /  3.2  /  76 

29  /  72.76  /  96 

8  /  8.00  /  8 

3  /  69.8  /  149 

4  /  101.32  /  148 

2  /  7.36  /  8 

3  /  69.8  /  149 

35  /  110.32  /  152 

8  /  8.00  /  8 

3  /  19.47  /  817 

15  /  339.60  /  469 

3  /  7.55  /  8 

11  /  135.2  /  817 

186  /  402.51  /  503 

8  /  8.00  /  8 

Scheme 

hast 

base-QT 

locals 

locals- QT 

vertex-extension 

Jnll-QT 


Scheme 

Num' 

spl 

)er  of  pat 

spl  -f  1 

hs  found 

spl  -f  2 

No.  of  pairs 

with  no  path 

No.  of  pairs 

with  longer  paths 

simple 

2.21 

13.22 

74.30 

N/A 

N/A 

base 

1.98 

8.20 

34.40 

123 

13  by  1.08  hops 

base-QT 

1.98 

8.36 

35.62 

110 

15  by  1.13  hops 

locals 

2.08 

9.18 

40.50 

97 

23  by  1.39  hops 

locals- QT 

2.08 

9.38 

42.08 

67 

23  by  1.30  hops 

vertex-extension 

2.18 

12.57 

64.98 

19 

6  by  1  hop 

full-QT 

2.19 

12.85 

67.37 

4 

4  by  1  hop 

Table  5:  Nuinber  of  paths  foTmd  for  Internetwork  2. 


dass  3  domains  axe  red.  Hence,  we  expect  more  valid  paths  between  source  and  destination  pairs. 

Our  evaluation  measures  were  computed  for  a  set  of  1000  source-destination  pairs. '  The  mini¬ 
mum  spl  of  these  pairs  was  2,  the  maximum  spl  was  10,  and  the  average  spl  was  5.93. 


Class  i 

No.  of  Domains 

No.  of  Regions^ 

%  of  Green  Domains 

Edges  b 

Class  j 

etween 

Local 

Classes  i  a 

Remote 

nd  j 

Far 

0 

10 

4 

0.85 

0 

8 

7 

0 

1 

100 

16 

0.80 

0 

190 

20 

0 

1 

50 

20 

0 

2 

1000 

64 

0.75 

0 

500 

50 

0 

1 

1200 

100 

0 

* 

2 

200 

40 

0 

3 

10000 

256 

0.10 

0 

300 

50 

0 

1 

250 

100 

0 

2 

10250 

150 

50 

3 

200 

150 

100 

Table  6:  Parameters  of  Internetwork  3. 


Branching  factor  is  4  fox  all  domain  classes. 


Table  7  and  Table  8  shows  the  results.  Similar  conclusions  to  Internetwork  1  and  2  hold  for 


Internetwork  3. 


Scheme 

Precinct  Size 

Merged  View  Size 

No.  of  Viewservers  Queried 

base 

2  /  3.5  /  171 

5  /  134.41  /  206 

3  /  7.26  /  8 

base-QT 

2  f  3.5  1  171 

55  /  154.51  /  206 

8  /  8.00  /  8 

locals 

3  /  70.17  /  171 

4  /  164.16  /  257 

2  /  7.09  /  8 

locals-QT 

3  /  70.17  /  171 

57  /  191.06  /  258 

8  /  8.00  /  8 

vertex-extension 

5  /  34.17  /  1986 

18  /  601.56  /  695 

3  /  7.26  /  8 

full-QT 

14  /  155.5  /  1986 

503  /  655.79  /  743 

8  /  8.00  /  8 

Table  7:  Precinct  sizes,  merged  view  sizes,  and  no  of  viewservers  queried  for  Internetwork  3. 


Scheme 

Numl 

spl 

jer  of  pat 

spl  -f  1 

hs  found 

spl  -f  2 

No.  of  pairs 

with  no  path 

No.  of  pairs 

with  longer  paths 

simple 

^^1 

368.97 

N/A 

N/A 

base 

2.83 

24.25 

178.08 

17 

11  by  1.09  hops 

base-QT 

2.87 

25.53 

193.41 

12 

8  by  1.12  hops 

locals 

2.87 

25.62 

196.33 

21 

8  by  1  hop 

locals-QT 

2.97 

27.59 

219.63 

2 

6  by  1  hop 

vertex-extension 

3.32 

35.73 

332.54 

5 

1 

1  by  1  hop 

full-QT 

3.33 

36.47 

346.44 

0 

0  by  0  hops 

Table  8:  Number  of  paths  found  for  Internetwork  3. 


Figure  9  through  Figure  11  show  the  number  of  spl,  spl  -b  1  and  spl  +  2  length  paths  found  by 
the  schemes  as  a  function  of  spl  (we  only  show  results  for  spl  values  for  which  more  than  10  pairs 
were  found).  We  do  not  include  base-QT,  locals  and  locals-QT  schemes  since  they  axe  very  close 
to  base  scheme.  As  expected,  as  spl  increases,  the  number  of  paths  for  a  source-destination  pair 
increases,  and  the  gap  between  the  simple  scheme  and  the  viewserver  schemes  increases. 


